0
from sklearn.model_selection import KFold
from sklearn.ensemble import RandomForestClassifier
kf =KFold(n_splits=10, shuffle=True)
print(type(kf))

features = ['f0','f1','f2','f3','f4','f5','f6','f7','f8','f9']

fold_idx=1
accrs= []
for train_idx, test_idx in kf.split(final_data):
    print('[Fold{}] train size={}, test size={}'.format(fold_idx, len(train_idx), len(test_idx)))
    train_d, test_d =final_data.iloc[train_idx], final_data.iloc[test_idx]

    train_y = train_d['class']
    train_x = train_d[features]

    test_y = test_d['class']
    test_y = test_d[features]

    model= RandomForestClassifier()
    model.fit(train_x,train_y)

    mean_accr = model.score(test_x, test_y)

    fold_idx +=1
    accrs.append(mean_accr)

print(np.average(accrs))

问题就在这里

23     mean_accr = model.score(test_x, test_y)
ValueError: Found input variables with inconsistent numbers of samples: [2087, 4174]

和 final_data 是 (20876,11) 我已经尝试过重塑,但它不起作用

我不知道为什么要这样做,因为当我只使用 RandomForestClassifier 时,它运行良好

并且我与 GridSerchCV 有相似性问题,在那种情况下也是

gcv.fit(train_x, train_y)

他们说有问题 ValueError: Invalid parameter classifier for estimator RandomForestClassifier

4

0 回答 0