我想读取所有位于同一目录中的一堆 excel 文件,并将它们存储在一个合并的 Excel 文件中的不同工作表中。
我最初尝试使用XLConnect
但不断收到错误GC overhead limit exceeded
。我偶然发现了这个问题,它说这是基于 Java 的 Excel 处理包(例如XLConnect
和xlsx
. 我尝试了那里建议的内存管理技巧,但没有奏效。对已接受答案的评论之一中的评论之一建议使用openxls
它基于RCpp
并因此避免此特定问题。
我目前的代码如下:
library(openxlsx)
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE) #pattern match as multiple files are from the same month
# Read them into a list and write to sheet
wb <- createWorkbook()
lapply(files, function(x){
print(x)
xlFile<-read.xlsx(xlsxFile = x, sheet = 1, startRow = 2, colNames = T) #Also tried
str(xlFile)
#Create a sheet in the new Excel file called Consolidated.xlsx with the month name
#Append current data in sheet
})
我遇到的问题是错误:Error in read.xlsx.default(xlsxFile = x, sheet = 1, startRow = 2, colNames = T) : openxlsx can not read .xls or .xlm files!
我已确保该files
变量包含所有感兴趣的文件(例如:January 2015.xls、January 2016.xls 等)。我还确保文件的路径是正确的,并且 Excel 文件确实存在于那里。
我已将写入 Excel 作为骨架代码,因为我需要先解决读取文件的问题。
如果有帮助,这里是代码尝试XLConnect
library(XLConnect)
setwd("D:/something/something")
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE)
# Read them into a list
df.list = lapply(files, readWorksheetFromFile, sheet=1, startRow=2)
#combine them into a single data frame and write to disk:
df = do.call(rbind, df.list)
rm(df.list)
outputFileName<-"Consolidated.xlsx"
# Load workbook (create if not existing)
wb <- loadWorkbook(outputFileName, create = TRUE)
createSheet(wb, name = mnth)
writeWorksheet(wb,df,sheet = mnth)
#write.xlsx2(df, outputFileName, sheetName = mnth, col.names = T, row.names = F, append = TRUE)
saveWorkbook(wb)
rm(df)
gc()