r - 使用 r 包 tm() 进行文本挖掘：“if (vectorized && (length <= 0)) 中的错误”

Question

我正在尝试在带有文本的数据框中使用 tm()，但此错误不断出现："Error in if (vectorized && (length <= 0)) stop("vectorized sources must have positive length") : missing value where TRUE/FALSE needed"

我有一个看起来像这样的数据框：

     person sex adult                                 state code
1         sam   m     0         Computer is fun. Not too fun.   K1
2        greg   m     0               No it's not, it's dumb.   K2
3     teacher   m     1                    What should we do?   K3
4         sam   m     0                  You liar, it stinks!   K4
5        greg   m     0               I am telling the truth!   K5
6       sally   f     0                How can we be certain?   K6
7        greg   m     0                      There is no way.   K7
8         sam   m     0                       I distrust you.   K8
9       sally   f     0           What are you talking about?   K9
10 researcher   f     1         Shall we move on?  Good then.  K10
11       greg   m     0 I'm hungry.  Let's eat.  You already?  K11

我只使用这些代码：

library(tm)
texts <- as.data.frame(texts)
mycorpus<- Corpus(DataframeSource(texts))

有谁知道这里出了什么问题？提前谢谢了！

score 0 · Accepted Answer

希望这是你要找的那个

xkcd.df <- read.csv(file.path(path, datafiles))
xkcd.corpus <- Corpus(DataframeSource(data.frame(xkcd.df[, 3])))

score 0 · Accepted Answer

听起来您需要为您的文本列创建一个语料库（并且它似乎与状态代码列合并，如果是这种情况，您需要将其分开）。假设状态代码是您要用于 tm 包的列，那么如果我没记错的话，您应该将该列（而不是整个数据框）拉入语料库。使用您提供的信息，如果您想这样做，您的代码应如下所示：

mycorpus<- Corpus(VectorSource(texts$state code))

如果您确实需要将文本与状态代码分开，则假设“文本”是您的新列：

mycorpus<- Corpus(VectorSource(texts$text))

r - 使用 r 包 tm() 进行文本挖掘：“if (vectorized && (length <= 0)) 中的错误”

2 回答 2

Related

Reference