r - 如何在R中制表符分隔的文件的标题之前跳过额外的行

Question

我正在使用的软件生成的日志文件包含可变数量的摘要信息行，后跟大量制表符分隔的数据。我正在尝试编写一个函数，将这些日志文件中的数据读取到忽略摘要信息的数据框中。摘要信息从不包含选项卡，因此以下功能有效：

read.parameters <- function(file.name, ...){
  lines <- scan(file.name, what="character", sep="\n")
  first.line <- min(grep("\\t", lines))
  return(read.delim(file.name, skip=first.line-1, ...))
}

但是，这些日志文件非常大，因此两次读取文件非常慢。肯定有更好的方法吗？

编辑添加：

Marek 建议使用一个textConnection对象。他在答案中建议的方式在一个大文件上失败了，但以下工作：

read.parameters <- function(file.name, ...){
  conn = file(file.name, "r")
  on.exit(close(conn))
  repeat{
    line = readLines(conn, 1)
    if (length(grep("\\t", line))) {
      pushBack(line, conn)
      break}}
  df <- read.delim(conn, ...)
  return(df)}

再次编辑：感谢 Marek 对上述功能的进一步改进。

score 1 · Accepted Answer

你不需要读两遍。textConnection在第一个结果上使用。

read.parameters <- function(file.name, ...){
  lines <- scan(file.name, what="character", sep="\n") # you got "tmp.log" here, i suppose file.name should be
  first.line <- min(grep("\\t", lines))
  return(read.delim(textConnection(lines), skip=first.line-1, ...))
}

score 0 · Accepted Answer

如果您可以确定标题信息不会超过 N 行，例如 N = 200，请尝试：

扫描（...，nlines = N）

这样你就不会重读超过 N 行。

r - 如何在R中制表符分隔的文件的标题之前跳过额外的行

2 回答 2

Related

Reference