3

我正在寻找一些帮助来编写一些 R 代码以遍历数据框中的行并将每行中的值传递给函数并将输出打印到 excel 文件、txt 文件或仅在控制台中。

这样做的目的是使用此网站上的功能将一堆距离/时间查询(数百个)自动化到谷歌地图:http ://www.nfactorialanalytics.com/r-vignette-for-the-week-finding -两地之间的时间距离/

该网站上的功能如下:

library(XML)
library(RCurl)
distance2Points <- function(origin,destination){
 results <- list();
 xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',origin,'&destinations=',destination,'&mode=driving&sensor=false')
 xmlfile <- xmlParse(getURL(xml.url))
 dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
 time <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
 distance <- as.numeric(sub(" km","",dist))
 time <- as.numeric(time)/60
 distance <- distance/1000
 results[['time']] <- time
 results[['dist']] <- distance
 return(results)
}

数据框将包含两列:原始邮政编码和目的地邮政编码(加拿大,嗯?)。我是初学者 R 程序员,所以我知道如何使用 read.table 将 txt 文件加载到数据框中。我只是不确定如何遍历数据帧,每次将值传递给 distance2Points 函数并执行。我认为这可以使用 for 循环或应用调用之一来完成?

谢谢您的帮助!

编辑:

为了简单起见,假设我想将这两个向量转换为数据框

> a <- c("L5B4P2","L5B4P2")
> b <- c("M5E1E5", "A2N1T3")
> postcodetest <- data.frame(a,b)
> postcodetest
       a      b
1 L5B4P2 M5E1E5
2 L5B4P2 A2N1T3

我应该如何遍历这两行以从 distance2Points 函数返回距离和时间?

4

1 回答 1

3

这是执行此操作的一种方法,lapply用于生成包含数据中每一行结果的列表,并将Reduce(rbind, [yourlist])该列表连接到数据框中,其行对应于原始行中的行。为了完成这项工作,我们还必须调整原始函数中的代码以返回单行数据框,所以我在这里完成了。

distance2Points <- function(origin,destination){

  require(XML)
  require(RCurl)

  xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',origin,'&destinations=',destination,'&mode=driving&sensor=false')
  xmlfile <- xmlParse(getURL(xml.url))
  dist <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
  time <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
  distance <- as.numeric(sub(" km","",dist))
  time <- as.numeric(time)/60
  distance <- distance/1000
  # this gives you a one-row data frame instead of a list, b/c it's easy to rbind
  results <- data.frame(time = time, distance = distance)
  return(results)
}

# now apply that function rowwise to your data, using lapply, and roll the results
# into a single data frame using Reduce(rbind)
results <- Reduce(rbind, lapply(seq(nrow(postcodetest)), function(i)
  distance2Points(postcodetest$a[i], postcodetest$b[i])))

应用于您的示例数据时的结果:

> results
        time distance
1   27.06667   27.062
2 1797.80000 2369.311

如果您希望在不创建新对象的情况下执行此操作,您还可以编写单独的函数来计算时间和距离——或者将这些输出作为选项的单个函数——然后使用sapply或仅mutate在原始数据中创建新列框架。以下是使用时的样子sapply

distance2Points <- function(origin, destination, output){

  require(XML)
  require(RCurl)

  xml.url <- paste0('http://maps.googleapis.com/maps/api/distancematrix/xml?origins=',
                    origin, '&destinations=', destination, '&mode=driving&sensor=false')

  xmlfile <- xmlParse(getURL(xml.url))

  if(output == "distance") {

    y <- xmlValue(xmlChildren(xpathApply(xmlfile,"//distance")[[1]])$value)
    y <- as.numeric(sub(" km", "", y))/1000

  } else if(output == "time") {

    y <- xmlValue(xmlChildren(xpathApply(xmlfile,"//duration")[[1]])$value)
    y <- as.numeric(y)/60

  } else {

    y <- NA    

  }

  return(y)

}

postcodetest$distance <- sapply(seq(nrow(postcodetest)), function(i)
  distance2Points(postcodetest$a[i], postcodetest$b[i], "distance"))

postcodetest$time <- sapply(seq(nrow(postcodetest)), function(i)
  distance2Points(postcodetest$a[i], postcodetest$b[i], "time"))

以下是您如何在dplyr管道中执行此操作的方法mutate

library(dplyr)

postcodetest <- postcodetest %>%
  mutate(distance = sapply(seq(nrow(postcodetest)), function(i)
           distance2Points(a[i], b[i], "distance")),
         time = sapply(seq(nrow(postcodetest)), function(i)
           distance2Points(a[i], b[i], "time")))
于 2016-11-03T18:02:52.350 回答