我正在研究一个回归问题,我想修改 xgboost 库中的损失函数,以便我的预测永远不会小于实际值。我写了这段代码:
def custom_loss(preds, dtrain):
labels = dtrain.get_label()
df = preds - labels
df = pd.DataFrame(df, columns=['val'])
df['valg'] = df['val'].apply(lambda x: 10*abs(x) if x<0 else x)
grad = df['valg'].as_matrix()
return preds-labels, grad
这本质上意味着我想更多地惩罚那些小于我的实际值的预测。但是,这不起作用,我的预测也没有改善。谁能帮我弄清楚我哪里出错了?谢谢。
编辑:整个python脚本 -
params = {"booster" : "gbtree",
"eta": 0.20,
"max_depth": 4,
"subsample": 0.75,
"colsample_bytree": 0.65,
"silent": 1,
"eval_metric": "rmse",
}
num_round = 400
def custom_loss(preds, dtrain):
labels = dtrain.get_label()
df = preds - labels
df = pd.DataFrame(df, columns=['val'])
df['valg'] = df['val'].apply(lambda x: 5*abs(x) if x<0 else x)
grad = df['valg'].as_matrix()
return preds-labels, grad
dtrain = xgb.DMatrix(X_train.drop('price_act', axis=1),
label=X_train['price_act'])
dtest = xgb.DMatrix(X_test.drop('price_act',axis=1),
label=X_test['price_act'])
watchlist = [(dtrain,'train'), (dtest,'eval')]
bst = xgb.train(params, dtrain, num_round, watchlist, custom_loss)