r - 将句子转换为 R 中的单词

Question

我有一个列形式的数据框 - 输入

Id  Comment
xc545   Ronald is a great person 
g6548   Hero worship is bad

我需要结果形式的输出

Id  Words 
xc545   Ronald
xc545   is
xc545   a
xc545   great
xc545   person
g6548   Hero
g6548   worship
g6548   is
g6548   bad

需要一个 R 语句来执行它。

以下是我尝试过的 -

result<-lapply(input,function(x)strsplit(x[2]," "))

但是，这仅返回一条记录。

score 9 · Accepted Answer

假设DF是您的 data.frame，可能是：

> List <- strsplit(DF$Comment, " ")
> data.frame(Id=rep(DF$Id, sapply(List, length)), Words=unlist(List))
     Id   Words
1 xc545  Ronald
2 xc545      is
3 xc545       a
4 xc545   great
5 xc545  person
6 g6548    Hero
7 g6548 worship
8 g6548      is
9 g6548     bad

请注意，我的答案仅在每对单词之间有一个简单空格时才有效。

score 4 · Accepted Answer

受此data.table启发的解决方案：

library(data.table)
dt = data.table(df)
dt[,c(Words=strsplit(Comment, " ", fixed = TRUE)), by = Id]
Id      V1
1: xc545  Ronald
2: xc545      is
3: xc545       a
4: xc545   great
5: xc545  person
6: g6548    Hero
7: g6548 worship
8: g6548      is
9: g6548     bad

score 3 · Accepted Answer

使用scan,tapply和stack:

d <- read.table(text='Id  Comment
xc545   "Ronald is a great person"
g6548   "Hero worship is bad"', header=TRUE, as.is=TRUE)

stack(tapply(d$Comment, d$Id, function(x) scan(text=x, what='')))
#    values   ind
# 1    Hero g6548
# 2 worship g6548
# 3      is g6548
# 4     bad g6548
# 5  Ronald xc545
# 6      is xc545
# 7       a xc545
# 8   great xc545
# 9  person xc545

r - 将句子转换为 R 中的单词

3 回答 3

Related

Reference