machine-learning - 选择特征后显示特征名称

Question

我需要为文本构建一个分类器，现在我使用 TfidfVectorizer 和 SelectKBest 来选择特征，如下所示：

vectorizer = TfidfVectorizer(sublinear_tf = True, max_df = 0.5, stop_words = 'english',charset_error='strict')

X_train_features = vectorizer.fit_transform(data_train.data)
y_train_labels = data_train.target;

ch2 = SelectKBest(chi2, k = 1000)
X_train_features = ch2.fit_transform(X_train_features, y_train_labels)

我想在选择 k 个最佳功能后打印出选定的功能名称（文本），有什么办法吗？我只需要打印出选定的功能名称，也许我应该使用 CountVectorizer 代替？

score 17 · Accepted Answer

17

以下应该有效：

np.asarray(vectorizer.get_feature_names())[ch2.get_support()]

于 2013-01-03T08:18:57.240 回答

score 9 · Accepted Answer

为了扩展@ogrisel 的答案，返回的特征列表在矢量化后的顺序相同。下面的代码将为您提供一个排名靠前的特征列表，这些特征根据其 Chi-2 分数以降序排列（以及相应的 p 值）：

top_ranked_features = sorted(enumerate(ch2.scores_),key=lambda x:x[1], reverse=True)[:1000]
top_ranked_features_indices = map(list,zip(*top_ranked_features))[0]
for feature_pvalue in zip(np.asarray(train_vectorizer.get_feature_names())[top_ranked_features_indices],ch2.pvalues_[top_ranked_features_indices]):
        print feature_pvalue

machine-learning - 选择特征后显示特征名称

2 回答 2

Related

Reference