r - 块中的 read.table - 错误消息

Question

我有一个包含 600 万行的大文件，我正在尝试以块的形式读取数据进行处理，这样我就不会达到我的 RAM 限制。这是我的代码（注意 temp.csv 只是一个包含 41 条记录的虚拟文件）：

infile <- file("data/temp.csv", open="r")

headers <- as.character(read.table(infile, header = FALSE, nrows=1, sep=",", stringsAsFactors=FALSE))

while(length(temp <-read.table(infile, header = FALSE, nrows=10, sep=",", stringsAsFactors=FALSE)) > 0){
  temp <- data.table(temp)
  setnames(temp, colnames(temp), headers)
  setkey(temp, Id)
  print(temp[1, Tags])
}

print("hi")

close(infile)

一切顺利，直到最后一次迭代。我收到此错误消息：

Error in read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) : 
  no lines available in input
In addition: Warning message:
In read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) :
  incomplete final line found by readTableHeader on 'data/temp.csv'

大概这是因为最后一次迭代只有 1 行记录，而 read.table 预期为 10？

所有数据实际上都被很好地读取了。令人惊讶的是，即使在最后的迭代中，temp仍然会转换为data.table. 但是print("hi")，之后的一切都不会被执行。我能做些什么来解决这个问题吗？

谢谢你。

score 2 · Accepted Answer

啊明白了！

repeat{
  temp <-read.table(infile, header = FALSE, nrows=10, sep=",", stringsAsFactors=FALSE)

  temp <- data.table(temp)
  setnames(temp, colnames(temp), headers)
  setkey(temp, Id)
  print(temp[1, Tags])

  if (nrow(temp) < 10) break
}

print("hi")

这仍然会产生警告消息，但不会再出现错误：

Warning message:
In read.table(infile, header = FALSE, nrows = 10, sep = ",", stringsAsFactors = FALSE) :
  incomplete final line found by readTableHeader on 'data/temp.csv'

r - 块中的 read.table - 错误消息

1 回答 1

Related

Reference