0

我想读取所有位于同一目录中的一堆 excel 文件,并将它们存储在一个合并的 Excel 文件中的不同工作表中。

我最初尝试使用XLConnect但不断收到错误GC overhead limit exceeded。我偶然发现了这个问题,它说这是基于 Java 的 Excel 处理包(例如XLConnectxlsx. 我尝试了那里建议的内存管理技巧,但没有奏效。对已接受答案的评论之一中的评论之一建议使用openxls它基于RCpp并因此避免此特定问题。

我目前的代码如下:

library(openxlsx)
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE)  #pattern match as multiple files are from the same month
# Read them into a list and write to sheet
wb <- createWorkbook()
lapply(files, function(x){
  print(x)
  xlFile<-read.xlsx(xlsxFile = x, sheet = 1, startRow = 2, colNames = T)  #Also tried
  str(xlFile)
  #Create a sheet in the new Excel file called Consolidated.xlsx with the month name
  #Append current data in sheet
})

我遇到的问题是错误:Error in read.xlsx.default(xlsxFile = x, sheet = 1, startRow = 2, colNames = T) : openxlsx can not read .xls or .xlm files!

我已确保该files变量包含所有感兴趣的文件(例如:January 2015.xls、January 2016.xls 等)。我还确保文件的路径是正确的,并且 Excel 文件确实存在于那里。

我已将写入 Excel 作为骨架代码,因为我需要先解决读取文件的问题。

如果有帮助,这里是代码尝试XLConnect

library(XLConnect)

setwd("D:/something/something")
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE)
# Read them into a list
df.list = lapply(files, readWorksheetFromFile, sheet=1, startRow=2)
#combine them into a single data frame and write to disk:
df = do.call(rbind, df.list)
rm(df.list)
outputFileName<-"Consolidated.xlsx"
# Load workbook (create if not existing)
wb <- loadWorkbook(outputFileName, create = TRUE)
createSheet(wb, name = mnth)
writeWorksheet(wb,df,sheet = mnth)
#write.xlsx2(df, outputFileName, sheetName = mnth, col.names = T, row.names = F, append = TRUE)
saveWorkbook(wb)

rm(df)
gc()
4

0 回答 0