0

我正在尝试从一系列 API 调用中构建数据框。每个调用都会返回一些 JSON,如下所示:

{"ip":"83.108.241.206","country_code":"NO","country_name":"Norway","region_code":"15","region_name":"Sogn og Fjordane","city":"Øvre Årdal","zipcode":"6884","latitude":61.3167,"longitude":7.8,"metro_code":"","area_code":""}

我想将一堆这些调用编译成一个数据框,其中包含“ip”、“国家代码”等列。但是我无法有效地将每个文件转换为我可以调用 rbind 的表单。

我正在使用 URL 向量进行 API 调用,如下所示:

> urls <- c("http://freegeoip.net/json/83.108.241.206", "http://freegeoip.net/json/129.118.15.107","http://freegeoip.net/json/189.144.59.71", "http://freegeoip.net/json/24.106.181.190", "http://freegeoip.net/json/213.226.181.3", "http://freegeoip.net/json/84.1.204.89")

> urls
[1] "http://freegeoip.net/json/83.108.241.206"
[2] "http://freegeoip.net/json/129.118.15.107"
[3] "http://freegeoip.net/json/189.144.59.71" 
[4] "http://freegeoip.net/json/24.106.181.190"
[5] "http://freegeoip.net/json/213.226.181.3" 
[6] "http://freegeoip.net/json/84.1.204.89" 

从 URL 到 JSON 再到数据框的最佳方式是什么?

4

1 回答 1

1

我正在复制“成绩单”,以便您可以看到中间值和我犯的一些错误。使用一些工具并不难:

> require(RJSONIO)    # Used version 1.3-0
> require(downloader)  # version 0.3
# probably not necessary but has wider range of url-types it can handle
Loading required package: downloader
> urls <- c("http://freegeoip.net/json/83.108.241.206",
 "http://freegeoip.net/json/129.118.15.107",
"http://freegeoip.net/json/189.144.59.71", 
"http://freegeoip.net/json/24.106.181.190", 
"http://freegeoip.net/json/213.226.181.3", 
"http://freegeoip.net/json/84.1.204.89")
> 
> download(urls[1], "temp")
100   225  100   225    0     0   1301      0 --:--:-- --:--:-- --:--:--  2710    0 --:--:-- --:--:-- --:--:--     0
# Experience tells me to use `quiet=TRUE` 
#  to prevent bad interactions with my GUI console display

> df <- fromJSON(file("temp"))  ####   See below for improved strategy  ###
> str(df)
List of 11
 $ ip          : chr "83.108.241.206"
 $ country_code: chr "NO"
 $ country_name: chr "Norway"
 $ region_code : chr "15"
 $ region_name : chr "Sogn og Fjordane"
 $ city        : chr "Øvre Årdal"
 $ zipcode     : chr "6884"
 $ latitude    : num 61.3
 $ longitude   : num 7.8
 $ metro_code  : chr ""
 $ area_code   : chr ""
> str(as.data.frame(df))
'data.frame':   1 obs. of  11 variables:
 $ ip          : Factor w/ 1 level "83.108.241.206": 1
 $ country_code: Factor w/ 1 level "NO": 1
 $ country_name: Factor w/ 1 level "Norway": 1
 $ region_code : Factor w/ 1 level "15": 1
 $ region_name : Factor w/ 1 level "Sogn og Fjordane": 1
 $ city        : Factor w/ 1 level "Øvre Årdal": 1
 $ zipcode     : Factor w/ 1 level "6884": 1
 $ latitude    : num 61.3
 $ longitude   : num 7.8
 $ metro_code  : Factor w/ 1 level "": 1
 $ area_code   : Factor w/ 1 level "": 1
> str(as.data.frame(df, stringsAsFactors=FALSE))
'data.frame':   1 obs. of  11 variables:
 $ ip          : chr "83.108.241.206"
 $ country_code: chr "NO"
 $ country_name: chr "Norway"
 $ region_code : chr "15"
 $ region_name : chr "Sogn og Fjordane"
 $ city        : chr "Øvre Årdal"
 $ zipcode     : chr "6884"
 $ latitude    : num 61.3
 $ longitude   : num 7.8
 $ metro_code  : chr ""
 $ area_code   : chr ""

这就是准备工作。rbind如果您将这些列作为因素保留,那么第一次调用就会搞砸:

df <- as.data.frame( fromJSON(file("temp")) , stringsAsFactors=FALSE)
for ( i in 2:length(urls) ) {download(urls[i], "temp", quiet=TRUE); df <- rbind( df, fromJSON( file("temp") )  )}
> df
   ip               country_code country_name    region_code region_name        
df "83.108.241.206" "NO"         "Norway"        "15"        "Sogn og Fjordane" 
   "129.118.15.107" "US"         "United States" "TX"        "Texas"            
   "189.144.59.71"  "MX"         "Mexico"        "09"        "Distrito Federal" 
   "24.106.181.190" "US"         "United States" "NC"        "North Carolina"   
   "213.226.181.3"  "LT"         "Lithuania"     "57"        "Kauno Apskritis"  
   "84.1.204.89"    "HU"         "Hungary"       "12"        "Komárom-Esztergom"
   city         zipcode latitude longitude metro_code area_code
df "Øvre Årdal" "6884"  61.3167  7.8       ""         ""       
   "Lubbock"    "79409" 33.61    -101.8213 "651"      "806"    
   "Mexico"     ""      19.4342  -99.1386  ""         ""       
   "Raleigh"    "27604" 35.8181  -78.5636  "560"      "919"    
   "Kaunas"     ""      54.9     23.9      ""         ""       
   "Környe"     ""      47.5467  18.3208   ""         ""     

将强制添加到 dataframe-class withstringsAsFactors=FALSE可以防止 rbind() 操作创建列表矩阵或在 rbinding 行中遇到问题。

于 2014-08-07T21:42:17.087 回答