我正在处理 Reddit 数据并尝试使用 NMF 主题建模来查找主题。它工作正常并产生主题,但最后我尝试可视化模型时,它显示以下错误。
-------------------------------------------------- ------------------------- ValidationError Traceback(最近一次调用最后一次)在 6 个 warnings.filterwarnings('ignore') 7 get_ipython().run_line_magic( 'matplotlib', 'inline') ----> 8 panel = pyLDAvis.sklearn.prepare(nmf, doc_term_matrix, tfidf_vect, mds='mmds') 9 pyLDAvis.display(panel)
~/PycharmProjects/News/venv/lib/python3.7/site-packages/pyLDAvis/sklearn.py in prepare(lda_model, dtm, vectorizer, **kwargs) 93 """ 94 opts = fp.merge(_extract_data(lda_model , dtm, vectorizer), kwargs) ---> 95 return pyLDAvis.prepare(**opts)
~/PycharmProjects/News/venv/lib/python3.7/site-packages/pyLDAvis/_prepare.py in prepare(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R, lambda_step, mds, n_jobs, plot_opts, sort_topics) 372 doc_lengths = _series_with_name(doc_lengths, 'doc_length') 373 vocab = _series_with_name(vocab, 'vocab') --> 374 _input_validate(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency) 375 R = min(R, len(vocab)) 376
~/PycharmProjects/News/venv/lib/python3.7/site-packages/pyLDAvis/_prepare.py in _input_validate(*args) 63 res = _input_check(*args) 64 if res: ---> 65 raise ValidationError(' \n' + '\n'.join([' * ' + s for s in res])) 66 67
ValidationError: * 并非 doc_topic_dists 中的所有行(分布)总和为 1。
我的可视化代码是:
import pyLDAvis
import pyLDAvis.sklearn
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
panel = pyLDAvis.sklearn.prepare(nmf, doc_term_matrix, tfidf_vect, mds='mmds')
pyLDAvis.display(panel)
我在堆栈中寻找答案并找到了这个 [https://stackoverflow.com/questions/55712807/pyldavis-validation-error-on-trying-to-visualize-topics-with-btm] 并试图通过接受的答案来解决(简单地删除了几个单词的行[少于 3 个单词的行],但我无法解决错误。
谁能帮我解决这个问题?
提前致谢。