from sklearn.model_selection import KFold
from sklearn.ensemble import RandomForestClassifier
kf =KFold(n_splits=10, shuffle=True)
print(type(kf))
features = ['f0','f1','f2','f3','f4','f5','f6','f7','f8','f9']
fold_idx=1
accrs= []
for train_idx, test_idx in kf.split(final_data):
print('[Fold{}] train size={}, test size={}'.format(fold_idx, len(train_idx), len(test_idx)))
train_d, test_d =final_data.iloc[train_idx], final_data.iloc[test_idx]
train_y = train_d['class']
train_x = train_d[features]
test_y = test_d['class']
test_y = test_d[features]
model= RandomForestClassifier()
model.fit(train_x,train_y)
mean_accr = model.score(test_x, test_y)
fold_idx +=1
accrs.append(mean_accr)
print(np.average(accrs))
问题就在这里
23 mean_accr = model.score(test_x, test_y)
ValueError: Found input variables with inconsistent numbers of samples: [2087, 4174]
和 final_data 是 (20876,11) 我已经尝试过重塑,但它不起作用
我不知道为什么要这样做,因为当我只使用 RandomForestClassifier 时,它运行良好
并且我与 GridSerchCV 有相似性问题,在那种情况下也是
gcv.fit(train_x, train_y)
他们说有问题 ValueError: Invalid parameter classifier for estimator RandomForestClassifier