string - 循环遍历每一行并将每个单词存储在一行中并在 R 中创建一个数据框

Question

我有以下文件：

[1]/tI /tam /tCharlotte   
[2]/ti /tam /tcharlotte   
[3]/tYou /tare /tsmart  
[4]/tyou /tare /tsmart

我希望输出数据框具有以下形式：

word      gloss  
I         i  
am        am      
Charlotte charlotte    
You       you    
are       are    
smart     smart

是否可以为此编写代码？我需要按标签分隔文件吗？

score 0 · Accepted Answer

你的问题并不完全清楚。例如，

您的文件中是否有数字 [1]、[2]、...？
偶数行只是奇数行的小写版本吗？

忽略数字并假设奇数行和偶数行不同，一种解决方案是：

##Read in the data. 
tmp = read.table(textConnection("/tI /tam /tCharlotte   
/ti /tam /tcharlotte   
/tYou /tare /tsmart  
/tyou /tare /tsmart"), sep="\n", stringsAsFactors=FALSE)

##Take the odd rows
##gsub: remove white space
##strsplit: split the string on "\t"
##unlist: go from a list to a vector
c1 = unlist(
    strsplit(
        gsub(" ", "", tmp[seq(1,nrow(tmp), 2),]), "/t"))

##Ditto the even rows
c2 = unlist(
    strsplit(
        gsub(" ", "", tmp[seq(2,nrow(tmp), 2),]), "/t"))

这为我们提供了两个可以放入数据框中的向量：

dd = data.frame(c1 = c1, c2 = c2)

我想你不想要空行，所以只需删除它们：

dd[apply(dd, 1, function(i) sum(nchar(i))>0),]

score 0 · Accepted Answer

此解决方案类似于@csgillespie 的解决方案，但每次都在一个命令中完成（一旦读取数据）。

读取数据：

dat <- read.table(text = "/tI /tam /tCharlotte   
/ti /tam /tcharlotte   
/tYou /tare /tsmart  
/tyou /tare /tsmart", stringsAsFactors = FALSE)

创建数据框：

structure(
 as.data.frame(
  lapply(
   lapply(list(c(TRUE, FALSE), c(FALSE, TRUE)),
          function(y) lapply(strsplit(
                              apply(dat, 1, "paste", collapse = ""), "/t"),
                             function(x) x[nchar(x) > 0])[y]),
   unlist)),
 .Names = c("word", "gloss"))

string - 循环遍历每一行并将每个单词存储在一行中并在 R 中创建一个数据框

2 回答 2

Related

Reference