lda - 来自 topicmodels 的 LDA() 函数中的附加种子词参数

Question

我正在寻找潜在狄利克雷分配（LDA）的深入示例，其中为 R 中的 topicmodels 包指定了种子词。

基本函数采用以下形式：
LDA(x, k, method = "Gibbs", control = NULL, model = NULL, ...)

并且文档仅说明：

对于method =“Gibbs”，可以将附加参数种子词指定为矩阵或类“simple_triplet_matrix”的对象；默认值为 NULL。

谁能给我一个完整的例子来说明它的外观和功能？

score 0 · Accepted Answer

取自这个答案： https ://stats.stackexchange.com/questions/384183/seeded-lda-using-topicmodels-in-r

library("topicmodels")
data("AssociatedPress", package = "topicmodels")

## We fit 6 topics.
## We specify five seed words for five topics, the sixth topic has no
## seed words.
library("slam")
set.seed(123)
i <- rep(1:5, each = 5)
j <- sample(1:ncol(AssociatedPress), 25)
SeedWeight <- 500 - 0.1
deltaS <- simple_triplet_matrix(i, j, v = rep(SeedWeight, 25),
                                nrow = 6, ncol = ncol(AssociatedPress))
set.seed(1000)
ldaS <- LDA(AssociatedPress, k = 6, method = "Gibbs", seedwords = deltaS, 
            control = list(alpha = 0.1, best = TRUE,
                           verbose = 500, burnin = 500, iter = 100, thin = 100, prefix = character()))

apply(deltaS, 1, function(x) which(x == SeedWeight))
apply(posterior(ldaS)$terms, 1, function(x) order(x, decreasing = TRUE)[1:5])

lda - 来自 topicmodels 的 LDA() 函数中的附加种子词参数

1 回答 1

Related

Reference