当我使用 LightGBM 训练模型时,如下所示:
xgtrain = lgb.Dataset(dtrain[predictors].values, label=dtrain[target].values,
feature_name=predictors,
categorical_feature=categorical_features
)
xgvalid = lgb.Dataset(dvalid[predictors].values, label=dvalid[target].values,
feature_name=predictors,
categorical_feature=categorical_features
)
evals_results = {}
bst1 = lgb.train(lgb_params,
xgtrain,
valid_sets=[xgtrain, xgvalid],
valid_names=['train','valid'],
evals_result=evals_results,
num_boost_round=num_boost_round,
early_stopping_rounds=early_stopping_rounds,
verbose_eval=10,
feval=feval)
n_estimators = bst1.best_iteration
print("\nModel Report")
print("n_estimators : ", n_estimators)
print(metrics+":", evals_results['valid'][metrics][n_estimators-1])
我运行了两次代码,一切都一样,除了:
(1) 第一次,
predictors = ['context_page_id', 'item_city_id', 'item_collected_level', 'item_price_level', 'item_pv_level', 'item_sales_level', 'shop_review_num_level', 'shop_review_positive_rate', 'shop_score_delivery', 'shop_score_description', 'shop_score_service', 'shop_star_level', 'user_age_level', 'user_gender_id', 'user_occupation_id', 'user_star_level', 'category_1', 'category_2', 'min', 'hour', 'day', 'week', 'buy_item', 'buy_shop', 'buy_brand', 'browse_total', 'buy_total', 'browse_buy_rate', 'item_browse', 'item_buy', 'item_browse_buy_rate', 'shop_browse', 'shop_buy', 'shop_browse_buy_rate', 'hour_bin_1', 'hour_bin_2', 'hour_bin_3', 'is_new_user_0', 'is_new_user_1', 'is_new_item_0', 'is_new_item_1', 'is_new_shop_0', 'is_new_shop_1', 'is_new_brand_0', 'is_new_brand_1']
(2) 第二次,
predictors = ['browse_buy_rate', 'browse_total', 'buy_brand', 'buy_item', 'buy_shop', 'buy_total', 'category_1', 'category_2', 'context_page_id', 'day', 'hour', 'item_browse', 'item_browse_buy_rate', 'item_buy', 'item_city_id', 'item_collected_level', 'item_price_level', 'item_pv_level', 'item_sales_level', 'min', 'shop_browse', 'shop_browse_buy_rate', 'shop_buy', 'shop_review_num_level', 'shop_review_positive_rate', 'shop_score_delivery', 'shop_score_description', 'shop_score_service', 'shop_star_level', 'user_age_level', 'user_gender_id', 'user_occupation_id', 'user_star_level', 'week', 'hour_bin_1', 'hour_bin_2', 'hour_bin_3', 'is_new_user_0', 'is_new_user_1', 'is_new_item_0', 'is_new_item_1', 'is_new_shop_0', 'is_new_shop_1', 'is_new_brand_0', 'is_new_brand_1']
只是改变顺序,但结果不同:
(1) 第一次:
...
[1030] train's binary_logloss: 0.0781902 valid's binary_logloss: 0.0821433
Early stopping, best iteration is:
[837] train's binary_logloss: 0.0799938 valid's binary_logloss: 0.0820824
Model Report
n_estimators : 837
binary_logloss: 0.08208239967439723
(2) 第二次:
...
[930] train's binary_logloss: 0.0792041 valid's binary_logloss: 0.0821642
Early stopping, best iteration is:
[738] train's binary_logloss: 0.0810454 valid's binary_logloss: 0.0821186
Model Report
n_estimators : 738
binary_logloss: 0.08211859038553634
谁能解释一下?谢谢你。