r - 在 R 中以编程方式分配的效率

Question

总之，我有一个脚本用于导入存储在几个 txt 文件中的大量数据。在单个文件中，并非所有行都将放在同一个表中（DF 现在切换到 DT），因此对于每个文件，我选择属于同一个 DF、getDF的所有行并将行assign归为它。

我第一次创建一个名为 table1 的 DF 时，我这样做：

name <- "table1" # in my code the value of name will depend on different factors
                 # and **not** known in advance
assign(name, someRows)

然后，在执行期间，我的代码可能会（在其他文件中）找到要放入 table1 数据框中的其他行，因此：

name <- "table"
assign(name, rbindfill(get(name), someRows))

我的问题是：assign(get(string), anyObject)以编程方式进行分配的最佳方式是什么？谢谢

编辑：

这是我的代码的简化版本：（中的每个项目dataSource都是read.table()一个文本文件的结果）

set.seed(1)
#
dataSource <- list(data.frame(fileType = rep(letters[1:2], each=4),
                              id       = rep(LETTERS[1:4], each=2),
                              var1     = as.integer(rnorm(8))),
                   data.frame(fileType = rep(letters[1:2], each=4),
                              id       = rep(LETTERS[1:4], each=2),
                              var1     = as.integer(rnorm(8))))
#                   #                                                                                          #
#                          
library(plyr)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
  temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
  if(exists(l)) assign(l, rbind.fill(get(l), rbind.fill(temp))) else assign(l, rbind.fill(temp))
}
#
#            
# now two data frames a and b are crated
#
#
# different method using rbindlist in place of rbind.fill (faster and, until now, I don't # have missing column to fill)
#
rm(a,b)
library(data.table)
#
tablesnames <- unique(unlist(lapply(dataSource,function(x) as.character(unique(x[,1])))))
for(l in tablesnames){
  temp <- lapply(dataSource, function(x) x[x[,1]==l, -1])
  if(exists(l)) assign(l, rbindlist(list(get(l), rbindlist(temp)))) else assign(l, rbindlist(temp))
}

score 4 · Accepted Answer

我建议使用 named list，并跳过使用assignand get。许多很酷的 R 功能（lapply例如）在列表上工作得非常好，并且不适用于使用assignand get。此外，您可以轻松地将列表传递给函数，而这对于与assign和组合的变量组可能有些麻烦get。

如果您想将一组文件读入一个大 data.frame 我会使用这样的东西（假设 csv 像文本文件）：

library(plyr)
list_of_files = list.files(pattern = "*.csv")
big_dataframe = ldply(list_of_files, read.csv)

或者，如果您想将结果保存在列表中：

big_list = lapply(list_of_files, read.csv)

并可能使用rbind.fill：

big_dataframe = do.call("rbind.fill", big_list)

r - 在 R 中以编程方式分配的效率

1 回答 1

Related

Reference