8

以下代码中的lda.show_topics模块仅打印每个主题的前 10 个单词的分布,我如何打印出语料库中所有单词的完整分布?

from gensim import corpora, models

documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]

stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2)

for i in lda.show_topics():
    print i
4

3 回答 3

8

有一个变量调用,您可以topnshow_topics()其中从每个主题的单词分布中指定您需要的前 N ​​个单词的数量。见http://radimrehurek.com/gensim/models/ldamodel.html

所以而不是默认的lda.show_topics(). 您可以将len(dictionary)用于每个主题的完整单词分布:

for i in lda.show_topics(topn=len(dictionary)):
    print i
于 2013-07-15T20:16:42.123 回答
4

有两个变量调用num_topicsnum_wordsin show_topics(),对于num_topics主题数量,返回num_words最重要的单词(默认情况下,每个主题 10 个单词)。见http://radimrehurek.com/gensim/models/ldamodel.html#gensim.models.ldamodel.LdaModel.show_topics

因此,您可以将len(lda.id2word)用于每个主题的完整单词分布,并将lda.num_topics用于您的 lda 模型中的所有主题。

for i in lda.show_topics(formatted=False,num_topics=lda.num_topics,num_words=len(lda.id2word)):
    print i
于 2016-05-17T15:09:42.317 回答
0

下面的代码将打印你的话以及它们的概率。我已经打印了前 10 个单词。您可以更改 num_words = 10 以打印每个主题的更多单词。

for words in lda.show_topics(formatted=False,num_words=10):
    print(words[0])
    print("******************************")
    for word_prob in words[1]:
        print("(",dictionary[int(word_prob[0])],",",word_prob[1],")",end = "")
    print("")
    print("******************************")
于 2017-10-19T18:32:08.323 回答