r - 在一行中找到一个特定的字符并阅读它？

Question

我对此有点问题。我拥有的每个文件的标题如下所示：

*COUNTRY : US                                     *
***************************************************
*CAPITAL : Washington, D.C, district of columbia  *
*Language: English                                *  
***************************************************
V1 V2 V3

然后是我的数据变量（V1，V2，...）。我想要做的是只从每个文件中获取语言（英语、法语、西班牙语......）并将其放入我的情节脚本中。因为当我阅读文件时，我会跳过这些行read.table，否则read.table将无法工作。希望你理解我的问题。

score 4 · Accepted Answer

你可以使用这样的东西：

## File name
filename <- "/tmp/temp.txt"
## Read the 5 first lines
header <- readLines(filename, n=5)
## Grep the language field in these lines
result <- grep("^\\*Language: .*$", header, value=TRUE)
## Extract the language string
sub("^\\*Language: ", "", result)

请注意，如果 Language 字段始终位于第 4 行，您可以简单地执行以下操作：

filename <- "/tmp/temp.txt"
header <- readLines(filename, n=4)
sub("^\\*Language: ", "", header[4])

score 1 · Accepted Answer

我会打开一个文件连接，读取标题数据，然后继续使用read.table来读取文件的其余部分。这样，您只需读取一次文件。像这样的东西：

f <- file( "data.txt", open = "r" )
language <- NULL
while( TRUE ){
    line <- readLines( f, 1L )
    if( grepl( "*Language: ", line ) ){
        language <- sub( "*Language: (.[*])", "\\1", line )    
    }
    if( !is.null(language) && grepl("^[*][*]", line) ) break
}
read.table( f, header = TRUE )
close( f )

r - 在一行中找到一个特定的字符并阅读它？

2 回答 2

Related

Reference