0

Tf -idf 给出值错误它在现在它的抛出错误之前工作正常

tf_idf_vectorizer = TfidfVectorizer(ngram_range=(2,2))
tf_train=tf_idf_vectorizer.fit_transform(X_train)
tf_test= tf_idf_vectorizer.transform(X_test)
model=LogisticRegression()
model.fit(X_train,y_train)
y_predict=model.predict(X_test)

ValueError: X has 97624 features per sample; expecting 11
4

1 回答 1

0

应该model.fit(tf_train, y_train)的话model.predict(tf_test)

tf_idf_vectorizer = TfidfVectorizer(ngram_range=(2,2))

tf_train=tf_idf_vectorizer.fit_transform(X_train)
tf_test= tf_idf_vectorizer.transform(X_test)

model=LogisticRegression()

model.fit(tf_train, y_train)

y_predict=model.predict(tf_test)

fit_tranform转换后的输入,即tf_train您将 应用于model.predict转换后的测试输入,即tf_test


作为一个理智的人,检查,做一个len(X_train),然后你应该得到 97624 len(X_test),你应该得到 11。这就是这个错误的来源:

ValueError: X 每个样本有 97624 个特征;期待 11

P/S:仔细查看https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

于 2020-02-24T15:52:17.813 回答