我有一个 lightgbm 多类分类模型,我想为其创建一个混淆矩阵。第一步,我只想在 df 上绘制预测值与实际值……我的问题是 lightgbm.predict 是否会按您给它的数据集的顺序返回预测值。
如果您遵循下面的代码,我的“预测”部分是否正确地将测试数据集行与对应的预测行匹配?
这是我创建测试和训练集的方式:
# split train and test into X and Y
X_train = train_data[:,0:(model.shape[1]-2)] ; Y_train = train_data[:,model.shape[1]-1] # python starts counting at 0
X_test = test_data[:,0:(model.shape[1]-2)] ; Y_test = test_data[:,model.shape[1]-1] # python starts counting at 0
#training and eval dataset
lgb_train = lgb.Dataset(data = X_train, label = Y_train)
lgb_test = lgb.Dataset(data = X_test, label = Y_test)
运行模型:
#run model
bst_model = lgb.train(params = parameters, train_set = lgb_train, num_boost_round = 1000,
valid_sets = [lgb_train,lgb_test], early_stopping_rounds = 7)
#categorical_feature = categoricals_vec)
然后是预测:
#Predictions
preds = bst_model.predict(X_test)
preds_df = pd.DataFrame(preds, columns = ['0','1','2'])
preds_df['pred'] = preds_df.idxmax(axis=1)
preds_df['actual'] = boost_data_set.iloc[0:breakpoint,boost_data_set.shape[1]-1]