r - 在目录中加载新文件

Question

我有一个 R 脚本来加载目录中的多个文本文件并将数据保存为压缩的 .rda。看起来像这样，

#!/usr/bin/Rscript --vanilla

args <- commandArgs(TRUE)
## arg[1] is the folder name

outname <- paste(args[1], ".rda", sep="")

files <- list.files(path=args[1], pattern=".txt", full=TRUE)

tmp <- list()
if(file.exists(outname)){
  message("found ", outname)
  load(outname)
  tmp <- get(args[1]) # previously read stuff
  files <- setdiff(files, names(tmp))

}

 if(is.null(files)) 
    message("no new files") else {

## read the files into a list of matrices
results <- plyr::llply(files, read.table, .progress="text")
names(results) <- files

assign(args[1], c(tmp, results))
message("now saving... ", args[1])
save(list=args[1], file=outname)
}
message("all done!")

这些文件非常大（每个 15Mb，通常 50 个），因此运行此脚本通常需要几分钟时间，其中很大一部分用于编写 .rda 结果。

我经常用新的数据文件更新目录，因此我想将它们附加到以前保存和压缩的数据中。这就是我在上面所做的，通过检查是否已经存在具有该名称的输出文件。最后一步仍然很慢，保存 .rda 文件。

有没有更聪明的方法可以在某些包中解决这个问题，跟踪哪些文件已被读取，并更快地保存？

我看到它knitr用于tools:::makeLazyLoadDB保存其缓存的计算，但没有记录这个函数，所以我不确定在哪里使用它有意义。

score 6 · Accepted Answer

对于我需要经常读取（或写入）的中间文件，我使用

save (..., compress = FALSE)

这大大加快了速度。

r - 在目录中加载新文件

1 回答 1

Related

Reference