在 tfidf 矢量化器之后我有以下输出。我想将密集输出解析为熊猫数据框列,但我无法直接将 toarray 或 todense 函数应用于稀疏 tfidf 输出并将其传递给熊猫数据框列。所以我将 tfidf 结果的密集输出接收到一个列表中。现在列表的形状为(6,20),我想将列表的每一行迭代解析为熊猫数据框列中的行,因为数据框列的长度也是 6。我尝试将列表转换为熊猫系列并将其传递给数据框但是不适用于二维列表。
from sklearn.feature_extraction.text import TfidfVectorizer
new_docs = ['Men Tops Tshirts missing ', 'Electronics Computers Tablets Components Parts Razer',
'Women Tops Blouses Blouse Target ', 'Home Home Décor Home Décor Accents missing ',
'Women Jewelry Necklaces missing ', 'Women Other Other missing ']
vectorizer = TfidfVectorizer(TfidfVectorizer(ngram_range=(1,2),
min_df=3, max_df=0.9, strip_accents='unicode', use_idf=1,
smooth_idf=1, sublinear_tf=1 ))
new_term_freq_matrix = vectorizer.fit_transform(new_docs)
print (vectorizer.vocabulary_)
print (new_term_freq_matrix.todense())
example = pd.DataFrame({'test_data_column': new_docs})
lt_1 = []
lt_1 = (vectorizer.fit_transform(new_docs)).toarray()
print(lt_1)
print(lt_1.shape)
(6, 20)
print(example)
test_data_column
0 Men Tops Tshirts missing
1 Electronics Computers Tablets Components Parts Razer
2 Women Tops Blouses Blouse Target
3 Home Home Décor Home Décor Accents missing
4 Women Jewelry Necklaces missing
5 Women Other Other missing