r - 检查 TermDocumentMatrix 以获取 R 中单词/术语的完整列表

翻译自：https://stackoverflow.com/questions/43748874 2017-05-02T23:41:13.627

4590 次

我正在尝试使用inspect(TermDocumentMatrix())获取文本文档之间的单词/术语频率列表（在 R 中）

使用来自的示例代码?TermDocumentMatrix：

data("crude")
tdm <- TermDocumentMatrix(crude, control = list(removePunctuation = TRUE, 
    stopwords = TRUE))
dtm <- DocumentTermMatrix(crude, control = list(weighting = function(x) 
    weightTfIdf(x, normalize = stopwords = TRUE)))

现在，我可以检查这些：

inspect(tdm[1:1000, 1:5])

结果是：

<<TermDocumentMatrix (terms: 1000, documents: 5)>>
Non-/sparse entries: 322/4678
Sparsity           : 94%
Maximal term length: 16
Weighting          : term frequency (tf)
Sample             :
            Docs
Terms        127 144 191 194 211
  crude        2   0   2   3   0
  demand       0   5   0   0   0
  dlrs         2   0   1   2   2
  mln          0   4   0   0   2
  oil          5  12   2   1   1
  opec         0  13   0   0   0
  price        2   1   2   2   0
  prices       3   5   0   0   0
  production   0   6   0   0   0
  said         3  11   1   1   3

但是，我想要更长的术语列表......我怎样才能得到这个？

我试过myinspection = inspect(tdm[1:1000, 1:5])了，但它没有让我到任何地方

r - 检查 TermDocumentMatrix 以获取 R 中单词/术语的完整列表

0 回答 0

Related

Reference