我正在尝试学习在 R 中使用 lsa 包。我正在使用比下面的示例更大的数据集,但这是出于可重复性的目的(此人在他的网站上发布此代码的道具,这是一个很好的资源)。
Error in Ops.simple_triplet_matrix(m, 1) : Incompatible dimensions.
# load required libraries
lsa <- function () {
# 1. Prepare mock data
text <- c("transporting food by cars will cause global warming. so we should go local.",
"we should try to convince our parents to stop using cars because it will cause global warming.",
"some food, such as mongo, requires a warm weather to grow. so they have to be transported to canada.",
"a typical Electronic Circuit can be built with a battery, a bulb, and a switch.",
"electricity flows from batteries to the bulb, just like water flows through a tube.",
"batteries have chemical energe in it. then electrons flow through a bulb to light it up.",
"birds can fly because they have feather and they are light.", "why some birds like pigeon can fly while some others like chicken cannot?",
"feather is important for birds' fly. if feather on a bird's wings is removed, this bird cannot fly.")
view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3))
df <- data.frame(text, view, stringsAsFactors = FALSE)
# prepare corpus
corpus <- Corpus(VectorSource(df$text))
# corpus <- tm_map(corpus, tolower)
# corpus <- tm_map(corpus, removePunctuation)
# corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
# corpus <- tm_map(corpus, stemDocument, language = "english")
corpus <- tm_map(corpus, PlainTextDocument)
# 2. MDS with raw term-document matrix compute distance matrix
td.mat <- TermDocumentMatrix(corpus)
td.mat.lsa <- lw_logtf(td.mat) * gw_idf(td.mat) # weighting
lsaSpace <- lsa(td.mat.lsa) # create LSA space
dist.mat.lsa <- dist(t(as.textmatrix(lsaSpace))) # compute distance matrix
return(dist.mat.lsa) # check distance matrix
我可以毫无问题地生成语料库,并且可以将其转换为术语文档矩阵。当我定义 dt.mat.lsa 时触发错误。
4 stop("Incompatible dimensions.")
3 Ops.simple_triplet_matrix(m, 1)
2 lw_logtf(td.mat) at lsa.R#31
1 lsa()
- 为什么我会收到此错误?
- 如何修复我的代码以避免此类错误?