python - ValueError：数组不能包含 Python 中带有 NMF 和 TF-IDF 的 infs 或 NaN

问问题 2021-08-20T20:19:09.870

45 次

我正在尝试通过NMF通过TF-IDF分解来估计主题。但是当我运行以下行时：

nmf = NMF(n_components = dimension)
nmf_array = nmf.fit_transform(x_tfidf)

我收到了这个错误：

ValueError：数组不得包含 infs 或 NaNs

但是，当我在 Tf-IDF 中搜索 Nans 和 infs 时，找不到任何内容：

np.isinf(x_tfidf.data).any() #this return False
np.isnan(x_tfidf.data).any() #this also return False

完整的代码是：

nltk.download('stopwords')
stop_words_sp = stopwords.words('spanish')
custom_stop_words = ["https", "citar", "www", "com", "youtube", "mil","ar", "hs"]
stop_words = custom_stop_words + stop_words_sp
count_vect = CountVectorizer(max_df = 0.9, min_df = 0.1, stop_words=stop_words, lowercase=True,analyzer=stemmed_words)
x_counts = count_vect.fit_transform(textos)

    # Genero matriz con valorizacion tf-idf
tfidf_transformer = TfidfTransformer()
x_tfidf = tfidf_transformer.fit_transform(x_counts)
lda = NMF(n_components = 2)
lda_array = lda.fit_transform(x_tfidf)

其中变量textos是一个没有任何空字符串的西班牙语文本数组。

这是完整的跟踪错误：

python - ValueError：数组不能包含 Python 中带有 NMF 和 TF-IDF 的 infs 或 NaN

0 回答 0

Related

Reference