python - 为什么最好的损失没有更新？

Question

我正在尝试使用 HYPEROPT 运行参数优化，但我没有看到最佳损失值的打印有任何变化。

我试图更改精度符号，但没有帮助。我在自己的随机试验中尝试了测试模型，结果要好得多。如何优化参数？

最小代码示例：

import pandas as pd
from sklearn.metrics import roc_auc_score
from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
import xgboost as xgb

def objective(space):
    clf = xgb.XGBClassifier(
        n_estimators=space['n_estimators'], max_depth=int(space['max_depth']), gamma=space['gamma'],
        reg_alpha=int(space['reg_alpha']), min_child_weight=int(space['min_child_weight']),
        colsample_bytree=int(space['colsample_bytree']))

    evaluation = [(train, train_labels), (test, test_labels)]

    clf.fit(train, train_labels,
            eval_set=evaluation, eval_metric="auc",
            early_stopping_rounds=10, verbose=True)

    pred = clf.predict(test)
    accuracy = roc_auc_score(test_labels, pred)
    print("ROC:", accuracy)
    return {'loss': -accuracy, 'status': STATUS_OK}

space = {'max_depth': hp.quniform("max_depth", 3, 300, 1),
         'gamma': hp.uniform('gamma', 1, 9),
         'reg_alpha': hp.quniform('reg_alpha', 5, 180, 1),
         'reg_lambda': hp.uniform('reg_lambda', 0, 1),
         'colsample_bytree': hp.uniform('colsample_bytree', 0.1, 1),
         'min_child_weight': hp.quniform('min_child_weight', 0, 10, 1),
         'n_estimators': 300,
         'seed': 0
         }

train, train_labels, train_Ids = pd.read_csv("train.csv")
test, test_labels, test_Ids = pd.read_csv("test.csv")

trials = Trials()

best_hyperparams = fmin(fn=objective,
                        space=space,
                        algo=tpe.suggest,
                        max_evals=400,
                        trials=trials)

print("The best hyperparameters are : ", "\n")
print(best_hyperparams)

结果在每次迭代开始时重复，例如：

2%|▏         | 9/400 [00:07<05:31,  1.18trial/s, best loss: -0.5]
...
5%|▍         | 19/400 [00:17<05:58,  1.06trial/s, best loss: -0.5]
...

score 2 · Accepted Answer

如果没有您的数据集，我无法重新创建您的确切问题，但我使用sklearn数据集进行了尝试load_breast_cancer。我很快就得到了高于 0.5 的分数，但有很多分数与基线分数相同。我认为这是因为您的reg_alpha范围太高，以至于某些模型最终被修剪为一无所有。希望在您的优化采样一些较小的 alpha 之后，tpe算法将开始关注更有用的值。

你可能会检查：

import numpy as np
alphas = [(trial['misc']['vals']['reg_alpha'][0], trial['result']['loss']) for trial in trials.trials]
print(np.array([alpha for alpha, score in alphas if score == -0.5]).min())
print(np.array([alpha for alpha, score in alphas if score != -0.5]).max())

对我来说，这给了85.0和89.0；有一点重叠，但从广义上讲，大于 85 的 alpha 会杀死模型。

python - 为什么最好的损失没有更新？

1 回答 1

Related

Reference