r - 从列表创建虚拟变量

Question

因此，我正在尝试根据框架的特定列中是否包含特定单词来创建虚拟变量以附加到数据框架。该列看起来像这样：

 dumcol = c("good night moon", "good night room", "good morning room", "hello moon")

我将根据每行中包含的单词创建虚拟变量，例如，对于第一行，它包含"good", "night",and "moon"，但不包含"room", "morning"or "hello"。

到目前为止，我一直在以一种非常原始的方式进行操作，即创建一个适当大小的 0 值矩阵，然后使用这样的 for 循环：

result=matrix(ncol=6,nrow=4)
wordlist=unique(unlist(strsplit(dumcal, " ")))
for (i in 1:6)
{ result[grep(wordlist[i], dumcol),i] = 1 }

或类似的东西。我猜有一种更快/更高效的方法来做到这一点。有什么建议吗？

score 3 · Accepted Answer

你可以试试：

library(tm)
myCorpus <- Corpus(VectorSource(dumcol))
myTDM <- TermDocumentMatrix(myCorpus, control = list(minWordLength = 1))
as.matrix(myTDM)

这使：

#         Docs
#Terms     1 2 3 4
#  good    1 1 1 0
#  hello   0 0 0 1
#  moon    1 0 0 1
#  morning 0 0 1 0
#  night   1 1 0 0
#  room    0 1 1 0

如果你想要列中的虚拟变量，你可以使用DocumentTermMatrix：

#    Terms
#Docs good hello moon morning night room
#   1    1     0    1       0     1    0
#   2    1     0    0       0     1    1
#   3    1     0    0       1     0    1
#   4    0     1    1       0     0    0

score 3 · Accepted Answer

尝试

 library(qdapTools)
 mtabulate(strsplit(dumcol, ' '))
 #    good hello moon morning night room
 #1    1     0    1       0     1    0
 #2    1     0    0       0     1    1
 #3    1     0    0       1     0    1
 #4    0     1    1       0     0    0

或者

 library(splitstackshape)
 cSplit_e(as.data.frame(dumcol), 'dumcol', sep=' ', 
                      type='character', fill=0, drop=TRUE)
 #  dumcol_good dumcol_hello dumcol_moon dumcol_morning dumcol_night dumcol_room
 #1           1            0           1              0            1           0
 #2           1            0           0              0            1           1
 #3           1            0           0              1            0           1
 #4           0            1           1              0            0           0

score 2 · Accepted Answer

我会做

sdum <- strsplit(dumcol," ")
us   <- unique(unlist(sdum))
res  <- sapply(sdum,function(x)table(factor(x,levels=us)))
#         [,1] [,2] [,3] [,4]
# good       1    1    1    0
# night      1    1    0    0
# moon       1    0    0    1
# room       0    1    1    0
# morning    0    0    1    0
# hello      0    0    0    1

结果可以转置t(res)为列中的虚拟变量（R 约定）。

r - 从列表创建虚拟变量

3 回答 3

Related

Reference