r - 使用 R 的网络爬虫

Question

我想使用R程序为网站“ https://www.latlong.net/convert-address-to-lat-long.html ”构建一个网络爬虫，它可以使用地址参数访问该网站，然后获取生成的从站点的纬度和经度。这将重复我拥有的数据集的长度。

由于我是网络爬虫领域的新手，因此我会寻求指导。

提前致谢。

score 0 · Accepted Answer

过去，我使用了一个称为 IP 堆栈 (ipstack.com) 的 API。

示例：包含名为“ipAddress”的 IP 地址列的数据帧“d”

for(i in 1:nrow(d)){
  #get data from API and save the text to variable 'str'
  lookupPath <- paste("http://api.ipstack.com/", d$ipAddress[i], "?access_key=INSERT YOUR API KEY HERE&format=1", sep = "")
  str <- readLines(lookupPath)

  #save all the data to a file
  f <- file(paste(i, ".txt", sep = ""))
  writeLines(str,f)
  close(f)

  #save data to main data frame 'd' as well:
  d$ipCountry[i]<-str[7]
  print(paste("Successfully saved ip #:", i))
}

在这个例子中，我专门在每个 IP 的 Country 位置之后，它出现在 API 返回的数据的第 7 行（因此 str[7]）

这个 API 可以让你每月免费查找 10,000 个地址，这对我的目的来说已经足够了。

r - 使用 R 的网络爬虫

1 回答 1

Related

Reference