6

我希望将 csv 文件导入 R,第一个非空行提供数据框列的名称。我知道您可以提供skip = 0参数来指定首先读取哪一行。但是,第一个非空行的行号可以在文件之间更改。

如何计算出有多少行是空的,并为每个文件动态跳过它们?

正如评论中所指出的,我需要澄清“空白”的含义。我的 csv 文件如下所示:

,,,
w,x,y,z
a,b,5,c
a,b,5,c
a,b,5,c
a,b,4,c
a,b,4,c
a,b,4,c

这意味着开头有一行逗号。

4

3 回答 3

9

read.csv automatically skips blank lines (unless you set blank.lines.skip=FALSE). See ?read.csv

After writing the above, the poster explained that blank lines are not actually blank but have commas in them but nothing between the commas. In that case use fread from the data.table package which will handle that. The skip= argument can be set to any character string found in the header:

library(data.table)
DT <- fread("myfile.csv", skip = "w") # assuming w is in the header
DF <- as.data.frame(DT)

The last line can be omitted if a data.table is ok as the returned value.

于 2014-10-20T00:19:40.953 回答
2

Depending on your file size, this may be not the best solution but will do the job.

Strategy here is, instead of reading file with delimiter, will read as lines, and count the characters and store into temp. Then, while loop will search for first non-zero character length in the list, then will read the file, and store as data_filename.

flist = list.files()
for (onefile in flist) {
  temp = nchar(readLines(onefile))
  i = 1
  while (temp[i] == 0) {
    i = i + 1
  }
  temp = read.table(onefile, sep = ",", skip = (i-1))
  assign(paste0(data, onefile), temp)
}

If file contains headers, you can start i from 2.

于 2014-10-20T00:19:22.457 回答
2

如果前几行确实是空的,那么read.csv应该自动跳到第一行。如果它们有逗号但没有值,那么您可以使用:

df = read.csv(file = 'd.csv')
df = read.csv(file = 'd.csv',skip = as.numeric(rownames(df[which(df[,1]!=''),])[1]))

如果您有大文件(因为您必须导入两次),它效率不高,但它可以工作。

如果要导入具有相同问题的制表符分隔文件(可变空行),请使用:

df = read.table(file = 'd.txt',sep='\t')
df = read.table(file = 'd.txt',skip = as.numeric(rownames(df[which(df[,1]!=''),])[1]))
于 2014-10-20T00:35:54.220 回答