r - 主题建模：如何使用我的拟合 LDA 模型来预测 R 中新数据集的新主题？

Question

我在 R 中使用“lda”包进行主题建模。我想使用适合新数据集的潜在狄利克雷分配（LDA）模型来预测新主题（文档中相关单词的集合）。在这个过程中，我遇到了 predict.distribution() 函数。但是该函数将 document_sums 作为输入参数，这是拟合新模型后的结果的输出。我需要帮助来了解现有模型在新数据集上的使用并预测主题。以下是 Johnathan Chang 为包编写的文档中的示例代码：以下是它的代码：

#Fit a model
data(cora.documents)
data(cora.vocab)

K <- 10 ## Num clusters

result <- lda.collapsed.gibbs.sampler(cora.documents,K, cora.vocab,25, 0.1, 0.1) 

# Predict new words for the first two documents
predictions <-  predictive.distribution(result$document_sums[,1:2], result$topics, 0.1, 0.1)

# Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)

任何帮助将不胜感激

感谢和问候，

Ankit

score 2 · Accepted Answer

我不知道如何在 R 中实现这一点，但请看一下 Wallach 等人在 2009 年发表的一篇文章。人。此处标题为“主题模型的评估方法” 。看一下第 4 节，它提到了三种计算 P(z|w) 的方法，一种基于重要性采样，另外两种称为“Chib-style estimator”和“left-to-right estimator”。

Mallet 实现了从左到右的估计方法

r - 主题建模：如何使用我的拟合 LDA 模型来预测 R 中新数据集的新主题？

1 回答 1

Related

Reference