r - 当列数未知时，使用 read.table 只读选择列

Question

我想从许多文件中读取前 3 列，我不一定知道每个文件包含的列数。此外，我不完全知道每个文件中要跳过的行数，尽管在标题行之前不会超过 19 行。

我的问题类似于这些问题：

但是我有一个不同的问题，即不知道要导入的文件中的列数或要跳过的确切行数。我只想从每个文件中导入前三列，它们的名称一致（Date/Time, Unit, Value）。

链接问题的read.table解决方案需要知道文件中的列数并指定colClasses每列的列数。我正在尝试通过一种方法读取数千个文件lapply，其中输入是 .csv 文件的列表，并read.table在每个文件上使用：

lapply(files, read.table, skip=19, header=T, sep=",")
# 2ndary issue: # of lines to skip varies.  maybe up to 19.

有没有办法解决提前不知道列数的问题？

编辑：我已经修改了@asb 提供的答案以适应我的问题，并且效果很好。

my.read.table <- function (file, sep=",", colClasses3=c("factor","factor","numeric"), ...) {

## extract the first line of interest, the line where "Date/Time,Unit,Value" appears
first.line <- readLines(file, n=20)[grepl("Date/Time,Unit,Value",
                                          readLines(file, n = 20)) == T]
## deteremine number of lines to skip (max possible = 19)
skip.number <- grep("Date/Time,Unit,Value", 
                    readLines(file, n=20), value=FALSE)-1
## Split the first line on the separator to find # of columns
ncols <- length(strsplit(first.line, sep, fixed=TRUE)[[1]])
## fixed=TRUE to avoid needing to escape the separator.

# use ncols here in the `colClasses` argument
out <- read.table(file, sep=sep, header=TRUE, skip = skip.number,
                  colClasses=c(colClasses3, rep("NULL", ncols - 3)), ...)
out
}

score 1 · Accepted Answer

如果您知道分隔符，就很容易知道您有多少列。您可以为每个文件使用这样的构造：

my.read.table <- function (file, sep=",", colClasses3=rep('double', 3), ...) {

  first.line <- readLines(file, n=1)

  ## Split the first line on the separator.

  ncols <- length(strsplit(first.line, sep, fixed=TRUE)[[1]])
  ## fixed=TRUE is to avoid the need to escape the separator when splitting.

  out <- read.table(file, sep=sep,
                    colClasses=c(colClasses3, rep("NULL", ncols - 3)), ...)

  out
}

然后使用您的解决方案：

lapply(files, my.read.table, skip=19, header=TRUE)

另外，请注意，您将不得不担心文件中是否有行名和列名，因为当行名和列名存在时 read.table 会应用一些智能。上面的解决方案是假设没有的。请阅读colClasses以?read.table进一步调整此内容以满足您的需求。

r - 当列数未知时，使用 read.table 只读选择列

1 回答 1

Related

Reference