11

我有一个列形式的数据框 - 输入

Id  Comment
xc545   Ronald is a great person 
g6548   Hero worship is bad

我需要结果形式的输出

Id  Words 
xc545   Ronald
xc545   is
xc545   a
xc545   great
xc545   person
g6548   Hero
g6548   worship
g6548   is
g6548   bad

需要一个 R 语句来执行它。

以下是我尝试过的 -

result<-lapply(input,function(x)strsplit(x[2]," "))

但是,这仅返回一条记录。

4

3 回答 3

9

假设DF是您的 data.frame,可能是:

> List <- strsplit(DF$Comment, " ")
> data.frame(Id=rep(DF$Id, sapply(List, length)), Words=unlist(List))
     Id   Words
1 xc545  Ronald
2 xc545      is
3 xc545       a
4 xc545   great
5 xc545  person
6 g6548    Hero
7 g6548 worship
8 g6548      is
9 g6548     bad

请注意,我的答案仅在每对单词之间有一个简单空格时才有效。

于 2013-06-04T15:21:25.090 回答
4

受此data.table启发的解决方案:

library(data.table)
dt = data.table(df)
dt[,c(Words=strsplit(Comment, " ", fixed = TRUE)), by = Id]
Id      V1
1: xc545  Ronald
2: xc545      is
3: xc545       a
4: xc545   great
5: xc545  person
6: g6548    Hero
7: g6548 worship
8: g6548      is
9: g6548     bad
于 2013-06-04T15:30:33.887 回答
3

使用scan,tapplystack:

d <- read.table(text='Id  Comment
xc545   "Ronald is a great person"
g6548   "Hero worship is bad"', header=TRUE, as.is=TRUE)

stack(tapply(d$Comment, d$Id, function(x) scan(text=x, what='')))
#    values   ind
# 1    Hero g6548
# 2 worship g6548
# 3      is g6548
# 4     bad g6548
# 5  Ronald xc545
# 6      is xc545
# 7       a xc545
# 8   great xc545
# 9  person xc545
于 2013-06-04T15:39:16.567 回答