r - 在没有标题的R中逐行读取大文件

Question

我在 R 中有一个非常大的数据文件（在 Giga 中），如果我尝试用 R 打开它，我会收到内存不足错误。

我需要逐行读取文件并进行一些分析。我在这个问题上找到了一个先前的问题，其中文件被 n 行读取并用丛跳转到某些行。我使用了“Nick Sabbe”的答案并添加了一些修改以满足我的需要。

考虑到我有以下 test.csv 文件样本：

A    B    C
200 19  0.1
400 18  0.1
300 29  0.1
800 88  0.1
600 80  0.1
150 50  0.1
190 33  0.1
270 42  0.1
900 73  0.1
730 95  0.1

我想逐行读取文件的内容并执行我的分析。所以我根据“Nick Sabbe”发布的代码创建了以下循环来读取。我有两个问题：1）每次打印新行时都会打印标题。2) R 的索引“X”列也被打印，尽管我正在删除这一列。

这是我正在使用的代码：

test<-function(){
 prev<-0

for(i in 1:100){
  j<-i-prev
  test1<-read.clump("file.csv",j,i)
  print(test1)
  prev<-i

}
}
####################
# Code by Nick Sabbe
###################
read.clump <- function(file, lines, clump, readFunc=read.csv,
                   skip=(lines*(clump-1))+ifelse((header) & (clump>1) & (!inherits(file, "connection")),1,0),
                   nrows=lines,header=TRUE,...){
if(clump > 1){
colnms<-NULL
if(header)
{
  colnms<-unlist(readFunc(file, nrows=1, header=F))
  #print(colnms)
}
p = readFunc(file, skip = skip,
             nrows = nrows, header=FALSE,...)
if(! is.null(colnms))
{
  colnames(p) = colnms
}
} else {
 p = readFunc(file, skip = skip, nrows = nrows, header=header)
}
p$X<-NULL   # Note: Here I'm setting the index to NULL
return(p)
}

我得到的输出：

       A       B    C
1      200      19   0.1
  NA   1       1     1
1  2   400     18   0.1
  NA   1       1    1
1  3   300     29   0.1
  NA   1       1    1
1  4   800     88   0.1
  NA   1       1    1
1  5   600     80   0.1

我想摆脱剩下的阅读：

 NA   1       1     1

另外，有什么方法可以在文件结束时停止 for 循环，例如其他语言的 EOF ？？？

score 5 · Accepted Answer

也许这样的事情可以帮助你：

inputFile <- "foo.txt"
con  <- file(inputFile, open = "r")
while (length(oneLine <- readLines(con, n = 1)) > 0) {
  myLine <- unlist((strsplit(oneLine, ",")))
  print(myLine)
} 
close(con)

或使用扫描以避免分裂为@MatthewPlourde

我使用 scan ：我跳过标题，并且 quiet = TRUE 没有消息说已经有多少项目

while (length(myLine <- scan(con,what="numeric",nlines=1,sep=',',skip=1,quiet=TRUE)) > 0 ){
   ## here I print , but you must have a process your line here
   print(as.numeric(myLine))

}

score 0 · Accepted Answer

我建议你检查chunked和disk.frame。它们都具有读取 CSV 的功能。

disk.frame::csv_to_disk.frame可能是你想要的功能。

r - 在没有标题的R中逐行读取大文件

我得到的输出：

2 回答 2

Related

Reference