我正在尝试使用 optuna 优化 lightGBM 模型。
阅读文档后,我注意到可以使用两种方法,如此处所述:LightGBM Tuner: New Optuna Integration for Hyperparameter Optimization。
第一种方法使用 optuna 优化的“标准”方法(目标函数 + 试验),第二种方法将所有内容与.train()
函数包装在一起。第一个基本上尝试超参数值的组合,而第二个则按照超参数的逐步方法进行优化。
这两种方法显示在 optuna github 存储库中的以下代码示例中:
两种代码都对相同的参数执行完全相同的优化(此处描述了第二种方法的优化参数),但方式不同(组合与逐步)。
我的问题是:
第二种方法是否可以指定自定义评估指标?在第一个中,我可以使用任何自定义指标轻松更改 github 示例中使用的准确性。
作为一个例子,我可以写:import lightgbm as lgb import numpy as np import sklearn.datasets import sklearn.metrics from sklearn.model_selection import train_test_split import optuna def my_eval_metric(valid_y, pred_labels): # my custom metric .......... .......... return my_metric def objective(trial): data, target = sklearn.datasets.load_breast_cancer(return_X_y=True) train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25) dtrain = lgb.Dataset(train_x, label=train_y) param = { "objective": "binary", "metric": "binary_logloss", "verbosity": -1, "boosting_type": "gbdt", "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True), "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True), "num_leaves": trial.suggest_int("num_leaves", 2, 256), "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0), "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0), "bagging_freq": trial.suggest_int("bagging_freq", 1, 7), "min_child_samples": trial.suggest_int("min_child_samples", 5, 100), } gbm = lgb.train(param, dtrain) preds = gbm.predict(valid_x) pred_labels = np.rint(preds) my_eval_metric_value = my_eval_metric(valid_y, pred_labels) return custom_metric_value if __name__ == "__main__": study = optuna.create_study(direction="maximize") study.optimize(objective, n_trials=100) print("Number of finished trials: {}".format(len(study.trials))) print("Best trial:") trial = study.best_trial print(" Value: {}".format(trial.value)) print(" Params: ") for key, value in trial.params.items(): print(" {}: {}".format(key, value))
此代码将返回使我的自定义指标最大化的 lightGBM 模型的参数。但是,在第二种方法中,我无法指定自己的自定义指标。
更新:
我设法在第二种方法中定义了我自己的自定义指标及其用法。一个最小的可重现代码如下(只需使用train_test_split
scikit 传递数据):
from sklearn.metrics import average_precision_score
import optuna.integration.lightgbm as lgb_sequential
def tune_lightGBM_sequential(X_train, X_val, y_train, y_val):
def calculate_ctr(gt):
positive = len([x for x in gt if x == 1])
ctr = positive/float(len(gt))
return ctr
def compute_rce(preds, train_data):
gt = train_data.get_label()
cross_entropy = log_loss(gt, preds)
data_ctr = calculate_ctr(gt)
strawman_cross_entropy = log_loss(gt, [data_ctr for _ in range(len(gt))])
rce = (1.0 - cross_entropy/strawman_cross_entropy)*100.0
return ('rce', rce, True)
def compute_avg_precision(preds, train_data):
gt = train_data.get_label()
avg_precision= average_precision_score(gt, preds)
return('avg_precision', avg_precision, True)
params = {
"objective": "binary",
"metric": 'custom',
"boosting_type": "gbdt",
"verbose" : 2
}
dtrain = lgb_sequential.Dataset(X_train, label=y_train)
dval = lgb_sequential.Dataset(X_val, label=y_val)
print('Starting training lightGBM sequential')
model = lgb_sequential.train(
params, dtrain, valid_sets=[dtrain, dval], verbose_eval=True,num_boost_round =2, early_stopping_rounds=100, feval = [compute_rce, compute_avg_precision]
)
return model.params
但是 Optuna 似乎无法根据我的自定义指标选择最佳试用版,事实上,我收到以下错误:
[W 2021-05-16 15:56:48,759] 试验 0 由于以下错误而失败:KeyError('custom') Traceback(最近一次调用最后一次):文件“C:\Users\Mattia\anaconda3\envs\rec_sys_challenge \lib\site-packages\optuna_optimize.py”,第 217 行,在 _run_trial value_or_values = func(trial) 文件“C:\Users\Mattia\anaconda3\envs\rec_sys_challenge\lib\site-packages\optuna\integration_lightgbm_tuner\optimize. py”,第 251 行,调用 val_score = self._get_booster_best_score(booster) 文件“C:\Users\Mattia\anaconda3\envs\rec_sys_challenge\lib\site-packages\optuna\integration_lightgbm_tuner\optimize.py”,第 118 行,在_get_booster_best_score val_score = booster.best_score[valid_name][metric] KeyError: 'custom'
这似乎是库的问题(您可以在此处找到更多信息:GitHub 问题),我尝试了许多建议的解决方案,但都没有奏效。
有什么帮助吗?