r - gdata 包中的 read.xls() 失败，“输入中没有可用的行”

Question

我read.xls()从gdata包中使用来阅读 Excel 工作簿，其中每个工作簿都有一张工作表。读取失败并出现以下错误。

> read.xls(list.files[[1]])
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  no lines available in input

我无法弄清楚错误。sheetCount()返回以下错误。

> sheetCount(list.files[[1]])
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 161 did not have 13 elements

但这很奇怪，因为工作簿有 27 列。对于第 161 +/- 1 行或第 13 +/- 1 列，似乎没有什么不寻常的地方。

在整个工作簿中，重复的条目都是空白的，您需要手动将它们向下扩展（这对于我想阅读的 750 多个工作簿来说是不切实际的）。

我尝试手动设置quote=''and quote='\''，但这些不会改变输出。我的问题是否read.xls()认为某些行参差不齐，但其他行不存在？任何指针？（我试过这个xlsReadWrite包，但我在 64 位 Win 7 上，它只适用于 32 位系统）。

谢谢！

更新

我关注了@G。Grothendieck 的建议并得到以下信息。

> k <- count.fields(xls2csv(list.xls[[1]]), sep = ","); k
NULL
> L <- readLines(xls2csv(list.xls[[1]])); L
character(0)

生成的临时文件xls2csv()是空的，所以现在我无法弄清楚为什么我的countSheets()调用会返回“第 161 行，第 13 列”错误。

我还按照@Joran 的建议将 .xls 文件转换为 Libre Office 中的 .csv 文件，它转换和读取都很好（即，它在所有 236 行和逻辑readLines()输出中计算了 27 个字段）。

更新 2

我应该补充一点，我认为这些 .xls 文件不是由 Excel 生成的（我的来源对它们的来源有点保密），但是当我在 Libre Office 中打开它们时，我没有收到任何错误或警告。

score 1 · Accepted Answer

试试这个，看看它是否有任何建议：

library(gdata)
k <- count.fields(xls2csv("myfile.xls"), sep = ","); k
L <- readLines(xls2csv("myfile.xls")); L

score 1 · Accepted Answer

In my case I think the problem is that the .xls to .csv Perl script fails (this is what gdata uses). I am still not sure why because LibreOffice converts the .xls to .csv with no warnings. I inspected the .csv with Vim and it looks normal (i.e., no crazy characters). I think the .xls is poorly formed by some proprietary script, so the Perl script fails.

Because LibreOffice works here, the easiest solution is to use command line LibreOffice (i.e., non of the Perl-based tools will work). I am on Win7, so I wrote a simple .bat file that converts every .xls in a directory.

for %%f in (*.xls) do soffice.exe -invisible -convert-to csv "%%f"

score 1 · Accepted Answer

使用 XLConnect ！

library(XLConnect)
readWorksheetFromFile(list.files[[1]], 1, useCachedValues=TRUE)

r - gdata 包中的 read.xls() 失败，“输入中没有可用的行”

3 回答 3

Related

Reference