17

我将三个文本文档存储为名为“dlist”的列表列表:

dlist <- structure(list(name = c("a", "b", "c"), text = list(c("the", "quick", "brown"), c("fox", "jumps", "over", "the"), c("lazy", "dog"))), .Names = c("name", "text"))

在我的脑海中,我发现像这样描绘 dlist 很有帮助:

   name  text
1  a     c("the", "quick", "brown")
2  b     c("fox", "jumps", "over", "the")
3  c     c("lazy", "dog")

这怎么能像下面这样被操纵?这个想法是绘制它,所以可以为 ggplot2 融化的东西会很好。

  name  text
1    a   the
2    a quick
3    a brown
4    b   fox
5    b jumps
6    b  over
7    b   the
8    c  lazy
9    c   dog

这是每个单词一行,给出单词及其父文档。

我努力了:

> expand.grid(dlist)
  name                  text
1    a     the, quick, brown
2    b     the, quick, brown
3    c     the, quick, brown
4    a fox, jumps, over, the
5    b fox, jumps, over, the
6    c fox, jumps, over, the
7    a             lazy, dog
8    b             lazy, dog
9    c             lazy, dog

> sapply(seq(1,3), function(x) (expand.grid(dlist$name[[x]], dlist$text[[x]])))
     [,1]     [,2]     [,3]    
Var1 factor,3 factor,4 factor,2
Var2 factor,3 factor,4 factor,2

unlist(dlist)
  name1   name2   name3   text1   text2   text3   text4 
    "a"     "b"     "c"   "the" "quick" "brown"   "fox" 
  text5   text6   text7   text8   text9 
"jumps"  "over"   "the"  "lazy"   "dog"

> sapply(seq(1,3), function(x) (cbind(dlist$name[[x]], dlist$text[[x]])))
[[1]]
     [,1] [,2]   
[1,] "a"  "the"  
[2,] "a"  "quick"
[3,] "a"  "brown"

[[2]]
     [,1] [,2]   
[1,] "b"  "fox"  
[2,] "b"  "jumps"
[3,] "b"  "over" 
[4,] "b"  "the"  

[[3]]
     [,1] [,2]  
[1,] "c"  "lazy"
[2,] "c"  "dog" 

公平地说,我被各种 apply 和 plyr 函数弄糊涂了,不知道从哪里开始。我从未见过像上面的“sapply”尝试那样的结果,也不明白。

4

3 回答 3

12

如果您将您的转换dlist为命名列表(我认为更适合的结构),您可以使用它stack()来获取您想要的两列 data.frame。

(第二行中的rev()andsetNames()调用只是调整列顺序和名称以匹配问题中显示的所需输出的众多方法之一。)

x <- setNames(dlist$text, dlist$name)
setNames(rev(stack(x)),  c("name", "text"))
#   name  text
# 1    a   the
# 2    a quick
# 3    a brown
# 4    b   fox
# 5    b jumps
# 6    b  over
# 7    b   the
# 8    c  lazy
# 9    c   dog
于 2013-05-01T21:14:04.430 回答
2

另一种解决方案,也许更通用:

do.call(rbind, do.call(mapply, c(dlist, FUN = data.frame, SIMPLIFY = FALSE)))

#     name  text
# a.1    a   the
# a.2    a quick
# a.3    a brown
# b.1    b   fox
# b.2    b jumps
# b.3    b  over
# b.4    b   the
# c.1    c  lazy
# c.2    c   dog
于 2013-05-01T23:54:10.660 回答
1

乔希的回答更甜美,但我想我会把帽子扔进擂台上。

dlist <- structure(list(name = c("a", "b", "c"), 
    text = list(c("the", "quick", "brown"), 
    c("fox", "jumps", "over", "the"), c("lazy", "dog"))), 
    .Names = c("name", "text"))

lens <- sapply(unlist(dlist[-1], recursive = FALSE), length)

data.frame(name = rep(dlist[[1]], lens), text = unlist(dlist[-1]), row.names = NULL)

##   name  text
## 1    a   the
## 2    a quick
## 3    a brown
## 4    b   fox
## 5    b jumps
## 6    b  over
## 7    b   the
## 8    c  lazy
## 9    c   dog

话虽这么说,列表列表有点笨拙的存储方法。一个向量列表(特别是命名的向量列表)会更容易处理。

于 2013-05-01T23:14:04.500 回答