0

我已经将大约 20 个文件读取到数据帧中,我想建立一个管道,以便将来将这些文件和其他文件合并到一个大数据帧中。我知道当两个以上的 csv 需要合并时编写的这个特殊函数:

multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y,all = TRUE)}, datalist)
}

MergedData=multmerge("file path")
                       

但是,它对我不起作用(已粘贴下面的错误消息),我认为一个可能的原因是我的数据帧有多个共享变量(变量调用方式相同)。不幸的是,上面的代码不允许我指定要合并的变量。有没有办法改进功能或者完全不同的方法来做到这一点?

错误:

Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  : 
  invalid multibyte string at '<a7>Y<95>s<6e><e9><87>A<f9>'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 4 appears to contain embedded nulls
3: In read.table(file = file, header = header, sep = sep, quote = quote,  :
  line 5 appears to contain embedded nulls
4: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string
5: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
 
 Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  : 
  invalid multibyte string at '<a7>Y<95>s<6e><e9><87>A<f9>' 
                       
                       

谢谢!

4

1 回答 1

0

在 中merge,我们可以指定列名merge

 multmerge <- function(mypath){
  filenames <- list.files(path=mypath, full.names=TRUE)
  datalist <-  lapply(filenames, read.csv, header=TRUE)
  Reduce(function(...) merge(...,all = TRUE, by = c('col1', 'col2')), datalist)
 }

 MergedData=multmerge("file path")
于 2020-04-27T20:07:08.553 回答