python - 在 Gensim 中使用潜在狄利克雷分配

Question

我正在做一个项目，我想使用潜在狄利克雷分配来从大量文章中提取主题。

我的代码是这样的：

import gensim
import csv
import json
import glob
from gensim import corpora, models
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from time import gmtime, strftime

tokenizer = RegexpTokenizer(r'\w+')
cachedStopWords = set(stopwords.words("english"))
body = []
processed = []

with open('/…/file.json') as j:
    data = json.load(j)

for i in range(0,len(data)):
    body.append(data[i]['text'].lower())

for entry in body:
    row = tokenizer.tokenize(entry)
    processed.append([word for word in row if word not in cachedStopWords])

dictionary = corpora.Dictionary(processed)
corpus = [dictionary.doc2bow(text) for text in processed]
lda = gensim.models.ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50, update_every=1, passes=1)
topics = lda.show_topics(num_topics=50, num_words=8)

other_doc = "After being jailed for life in 1964, Nelson Mandela became a worldwide symbol of resistance to apartheid. But his opposition to racism began many years before."
print lda[other_doc]

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-  packages/gensim/models/ldamodel.py", line 714, in __getitem__
gamma, _ = self.inference([bow])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site
packages/gensim/models/ldamodel.py", line 361, in inference ids = [id for id, _ in doc]
ValueError: need more than 1 value to unpack

我还尝试以 3 种不同的方式使用 LdaMulticore：

lda = gensim.models.LdaMulticore(corpus, id2word=dictionary, num_topics=100, workers=3)
lda = gensim.models.ldamodel.LdaMulticore(corpus, id2word=dictionary, num_topics=100, workers=3)
lda = models.LdaMulticore(corpus, id2word=dictionary, num_topics=100, workers=3)

每次我得到这个错误：

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute ‘LdaMulticore'

有任何想法吗？

先感谢您。

score 3 · Accepted Answer

您必须转换回相空间。

http://radimrehurek.com/gensim/tut3.html#similarity-interface

vec_bow = dictionary.doc2bow(other_doc.lower().split())
vec_lsi = lda[vec_bow] # convert the query to LSI space

score 0 · Accepted Answer

我意识到这已经过时了，但我也遇到了同样的问题。您可能指向的是较旧版本的 Gensim。您必须确保您使用的版本 >= 0.10.2。

使用“easy_install -U gensim”进行更新，然后确保您的 IDE 看到更新的库。

python - 在 Gensim 中使用潜在狄利克雷分配

2 回答 2

Related

Reference