我的数据看起来像:
DATA | FEATURE1 | FEATURE2 | ...
I | 0.3213 | 1.231 | ...
A | 5.0945 | 0.923 | ...
I | 0.3213 | 0.761 | ...
... | ... | .... | ...
我正在使用该代码:
import csv
import numpy as np
from sklearn.feature_selection import SelectKBest, mutual_info_classif
def get_ranks (path_to_csv_file, features_columns, label_column):
stats_file = list(csv.reader(open(path_to_csv_file)))
features, label = np.array(stats_file)[feature_columns],np.array(stats_file)[label_column]
mutual_info = mutual_info_classif(features, label)
使用 Weka,我需要做的就是选择InfogainAttrebuteEval
并获得FEATURES
. 出于某种原因,我使用上面的代码没有得到相同的排名结果。
问题是什么?