0

我有一个 CSV 数据文件,其中包含以下短语:

dd<-c("hello how are you?";"I am fine"; "hello how are you?"; "not too bad")

;我想使用 wordcloud获取每个句子块的频率(除以)。但是,我得到的是每个单词的频率。

有没有办法获得每个单元格中每块内容的频率?

在这个玩具示例中,我会得到:

Text                   Freq 
----------------------------
hello how are you?     2

I am fine              1

not too bad            1

非常感谢您提前

4

1 回答 1

0

FWIW,试试这个

library(wordcloud)
library(tm)
txt <- c("hello how are you? I am fine", "hello how are you?; not too bad")
semicolonTonekizer <- function(x) unlist(strsplit(as.character(x), ";", fixed = TRUE))
tdm <- TermDocumentMatrix(Corpus(VectorSource(txt)), list(tokenize = semicolonTonekizer))
tab <- rowSums(as.matrix(tdm))
wordcloud(names(tab), tab)
于 2015-04-29T13:56:33.700 回答