我正在尝试使用 HYPEROPT 运行参数优化,但我没有看到最佳损失值的打印有任何变化。
我试图更改精度符号,但没有帮助。我在自己的随机试验中尝试了测试模型,结果要好得多。如何优化参数?
我跟着这个笔记本。
最小代码示例:
import pandas as pd
from sklearn.metrics import roc_auc_score
from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
import xgboost as xgb
def objective(space):
clf = xgb.XGBClassifier(
n_estimators=space['n_estimators'], max_depth=int(space['max_depth']), gamma=space['gamma'],
reg_alpha=int(space['reg_alpha']), min_child_weight=int(space['min_child_weight']),
colsample_bytree=int(space['colsample_bytree']))
evaluation = [(train, train_labels), (test, test_labels)]
clf.fit(train, train_labels,
eval_set=evaluation, eval_metric="auc",
early_stopping_rounds=10, verbose=True)
pred = clf.predict(test)
accuracy = roc_auc_score(test_labels, pred)
print("ROC:", accuracy)
return {'loss': -accuracy, 'status': STATUS_OK}
space = {'max_depth': hp.quniform("max_depth", 3, 300, 1),
'gamma': hp.uniform('gamma', 1, 9),
'reg_alpha': hp.quniform('reg_alpha', 5, 180, 1),
'reg_lambda': hp.uniform('reg_lambda', 0, 1),
'colsample_bytree': hp.uniform('colsample_bytree', 0.1, 1),
'min_child_weight': hp.quniform('min_child_weight', 0, 10, 1),
'n_estimators': 300,
'seed': 0
}
train, train_labels, train_Ids = pd.read_csv("train.csv")
test, test_labels, test_Ids = pd.read_csv("test.csv")
trials = Trials()
best_hyperparams = fmin(fn=objective,
space=space,
algo=tpe.suggest,
max_evals=400,
trials=trials)
print("The best hyperparameters are : ", "\n")
print(best_hyperparams)
结果在每次迭代开始时重复,例如:
2%|▏ | 9/400 [00:07<05:31, 1.18trial/s, best loss: -0.5]
...
5%|▍ | 19/400 [00:17<05:58, 1.06trial/s, best loss: -0.5]
...