1
dtm <- DocumentTermMatrix(reuters, control=list(wordLengths=c(1,Inf)))

我正在考虑将 dtm 转换为术语术语矩阵,以下内容不正确:

dtm <- dtm %*% t(dtm)

怎么可能做到?

4

3 回答 3

2

If I understand the structure of a document-term matrix correctly, it is t(dtm) %*% dtm. See this answer.

于 2012-07-21T21:43:29.843 回答
0

我相信以下方法会起作用(注意您正在创建布尔或邻接矩阵):

t(as.matrix(dtm)) %*% as.matrix(dtm)

对于大 dtm,您将使用as.matrix. 该Matrix软件包可以提供帮助。注意我切换ij在第一个矩阵中进行转置。

data("acq")
dtm <- DocumentTermMatrix(acq, control=list(wordLengths=c(1,Inf)))
tdm <- t(dtm)

library(Matrix)
Xt <- sparseMatrix(j=dtm$i, i=dtm$j, x=dtm$v)
X <- sparseMatrix(j=tdm$i, i=tdm$j, x=tdm$v)

Xt %*% X

# For easier viewing
(Xt %*% X) [1:20, 1:20]
于 2014-02-28T05:55:23.967 回答
0
TDM <- TermDocumentMatrix(x) # Form a Term document matrix

termDocMatrix <- as.matrix(TDM) # convert your TDM into a matrix

termDocMatrix[termDocMatrix>=1] <- 1    # change the TDM into Boolean matrix

# term adjacency matrix
termMatrix <- termDocMatrix %*% t(termDocMatrix)


termMatrix[1:10,1:10]  # inspect terms numbered 1 to 10
于 2016-08-27T17:18:31.913 回答