1

我正在尝试获取 twitter 数据并创建 wordcloud,但我的代码在创建 TermDocumentMatrix 时出错。我的代码如下

twitter_search_data <- searchTwitter(searchString = text_to_search
                                    ,n = 500)

twitter_search_text <- sapply(twitter_search_data
                             ,function(x) x$getText())

twitter_search_corpus <- Corpus(VectorSource(twitter_search_text))

twitter_search_corpus <- tm_map(twitter_search_corpus, stripWhitespace, lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, content_transformer(tolower), lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, PlainTextDocument,lazy = TRUE)    

twitter_search_corpus <- tm_map(twitter_search_corpus, removePunctuation, lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, removeNumbers, lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, removeWords, c("the", "this", "The", "This", stopwords('english')), lazy = TRUE)

twitter_search_corpus <- tm_map(twitter_search_corpus, stemDocument, lazy = TRUE)

# Create Document Term Matrix 
tdm <- as.matrix(TermDocumentMatrix(twitter_search_corpus
                                   ,control=list(wordLengths=c(3,Inf))
                                   ))

创建 TermDocumentMatrix 之前没有错误。我得到的错误如下

mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) 中的警告:调度核心 1 在用户代码中遇到错误,作业的所有值都会受到影响 mclapply(unname( content(x)), termFreq, control) : 计划核心 1 在用户代码中遇到错误,作业的所有值都会受到影响错误”堆栈跟踪(最里面的第一个):74:FUN
73:lapply
72:setNames
71:as.list.VCorpus
70:as.list
69:lapply
68:meta.VCorpus
67:meta
66: TermDocumentMatrix.VCorpus
65 :TermDocumentMatrix
64 : as.matrix
63:观察事件处理程序
1:运行应用程序

我已经添加了lazy = TRUEcontent_transformer(tolower)但仍然出现错误。

4

1 回答 1

0

问题似乎与放置

twitter_search_corpus <- tm_map(twitter_search_corpus, stripWhitespace, lazy = TRUE)

删除标点符号后,在文本中插入数字和单词空格。所以上面删除空格的代码需要是创建 TermDocumentMatrix 之前的最后一条语句。

于 2016-05-16T09:15:16.817 回答