classification - Light GBM Value Error: ValueError: 对于提前停止，至少需要一个数据集和评估指标进行评估

Question

这是我的代码。这是一个二元分类问题，评估标准是AUC分数。我在 Stack Overflow 上查看了一个解决方案并实施了它，但没有奏效，仍然给我一个错误。

param_grid =   {
    'n_estimators' : [1000, 10000],  
    'boosting_type': ['gbdt'],
    'num_leaves': [30, 35],
    #'learning_rate': [0.01, 0.02, 0.05],
    #'colsample_bytree': [0.8, 0.95 ],
    'subsample': [0.8, 0.95],
    'is_unbalance': [True, False],
    #'reg_alpha'  : [0.01, 0.02, 0.05],
    #'reg_lambda' : [0.01, 0.02, 0.05],
    'min_split_gain' :[0.01, 0.02, 0.05]
    }
    
lgb  =  LGBMClassifier(random_state=42, early_stopping_rounds = 10, eval_metric  = 'auc', verbose_eval=20)


grid_search = GridSearchCV(lgb, param_grid= param_grid,
                            scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)

grid_search.fit(X_train, y_train, eval_set = (X_val, y_val))

best_model = grid_search.best_estimator_
start = time()
best_model.fit(X_train, y_train)
Train_time = round(time() - start, 4)

错误发生在best_model.fit(X_train, y_train)

score 3 · Accepted Answer

回答

此错误是由于您在网格搜索期间使用了提前停止，但在将最佳模型拟合到完整数据集时决定不使用提前停止。

您传入LGBMClassifier的一些关键字参数将添加到params训练生成的模型对象中，包括early_stopping_rounds.

要禁用提前停止，您可以使用update_params().

best_model = grid_search.best_estimator_

# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)

# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#

best_model.fit(X_train, y_train)

更多细节

我做了一些假设，把你的问题变成了一个最小的可重复的例子。将来，当您在这里提问时，我建议您这样做。它将帮助您获得更好、更快的帮助。

我安装了lightgbm3.1.0 和pip install lightgbm==3.1.0. 我在 Mac 上使用 Python 3.8.3。

我从您的示例中更改了一些内容，以使其更易于使用

删除了注释代码
[10, 100]减少 to和num_leavesto的迭代次数，[8, 10]以便训练运行得更快
添加进口
添加了特定的数据集和代码以重复生成它

可重现的例子

from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split

param_grid =   {
    'n_estimators' : [10, 100],
    'boosting_type': ['gbdt'],
    'num_leaves': [8, 10],
    'subsample': [0.8, 0.95],
    'is_unbalance': [True, False],
    'min_split_gain' :[0.01, 0.02, 0.05]
}

lgb = LGBMClassifier(
    random_state=42,
    early_stopping_rounds = 10,
    eval_metric  = 'auc',
    verbose_eval=20
)

grid_search = GridSearchCV(
    lgb,
    param_grid= param_grid,
    scoring='roc_auc',
    cv=5,
    n_jobs=-1,
    verbose=1
)

X, y = load_breast_cancer(return_X_y=True)


X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.1,
    random_state=42
)
                                 
grid_search.fit(
    X_train,
    y_train,
    eval_set = (X_test, y_test)
)

best_model = grid_search.best_estimator_

# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)

# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#

best_model.fit(X_train, y_train)

classification - Light GBM Value Error: ValueError: 对于提前停止，至少需要一个数据集和评估指标进行评估

1 回答 1

回答

更多细节

Related

Reference