python - 使用 LightGBM 时，不同的特征序列会导致不同的结果？

Question

当我使用 LightGBM 训练模型时，如下所示：

xgtrain = lgb.Dataset(dtrain[predictors].values, label=dtrain[target].values,
                      feature_name=predictors,
                      categorical_feature=categorical_features
                      )
xgvalid = lgb.Dataset(dvalid[predictors].values, label=dvalid[target].values,
                      feature_name=predictors,
                      categorical_feature=categorical_features
                      )

evals_results = {}

bst1 = lgb.train(lgb_params, 
                 xgtrain, 
                 valid_sets=[xgtrain, xgvalid], 
                 valid_names=['train','valid'], 
                 evals_result=evals_results, 
                 num_boost_round=num_boost_round,
                 early_stopping_rounds=early_stopping_rounds,
                 verbose_eval=10, 
                 feval=feval)

n_estimators = bst1.best_iteration
print("\nModel Report")
print("n_estimators : ", n_estimators)
print(metrics+":", evals_results['valid'][metrics][n_estimators-1])

我运行了两次代码，一切都一样，除了：

(1) 第一次，

predictors = ['context_page_id', 'item_city_id', 'item_collected_level', 'item_price_level', 'item_pv_level', 'item_sales_level', 'shop_review_num_level', 'shop_review_positive_rate', 'shop_score_delivery', 'shop_score_description', 'shop_score_service', 'shop_star_level', 'user_age_level', 'user_gender_id', 'user_occupation_id', 'user_star_level', 'category_1', 'category_2', 'min', 'hour', 'day', 'week', 'buy_item', 'buy_shop', 'buy_brand', 'browse_total', 'buy_total', 'browse_buy_rate', 'item_browse', 'item_buy', 'item_browse_buy_rate', 'shop_browse', 'shop_buy', 'shop_browse_buy_rate', 'hour_bin_1', 'hour_bin_2', 'hour_bin_3', 'is_new_user_0', 'is_new_user_1', 'is_new_item_0', 'is_new_item_1', 'is_new_shop_0', 'is_new_shop_1', 'is_new_brand_0', 'is_new_brand_1']

(2) 第二次，

predictors = ['browse_buy_rate', 'browse_total', 'buy_brand', 'buy_item', 'buy_shop', 'buy_total', 'category_1', 'category_2', 'context_page_id', 'day', 'hour', 'item_browse', 'item_browse_buy_rate', 'item_buy', 'item_city_id', 'item_collected_level', 'item_price_level', 'item_pv_level', 'item_sales_level', 'min', 'shop_browse', 'shop_browse_buy_rate', 'shop_buy', 'shop_review_num_level', 'shop_review_positive_rate', 'shop_score_delivery', 'shop_score_description', 'shop_score_service', 'shop_star_level', 'user_age_level', 'user_gender_id', 'user_occupation_id', 'user_star_level', 'week', 'hour_bin_1', 'hour_bin_2', 'hour_bin_3', 'is_new_user_0', 'is_new_user_1', 'is_new_item_0', 'is_new_item_1', 'is_new_shop_0', 'is_new_shop_1', 'is_new_brand_0', 'is_new_brand_1']

只是改变顺序，但结果不同：

(1) 第一次：

...
[1030]  train's binary_logloss: 0.0781902   valid's binary_logloss: 0.0821433
Early stopping, best iteration is:
[837]   train's binary_logloss: 0.0799938   valid's binary_logloss: 0.0820824

Model Report
n_estimators :  837
binary_logloss: 0.08208239967439723

(2) 第二次：

...
[930]   train's binary_logloss: 0.0792041   valid's binary_logloss: 0.0821642
Early stopping, best iteration is:
[738]   train's binary_logloss: 0.0810454   valid's binary_logloss: 0.0821186

Model Report
n_estimators :  738
binary_logloss: 0.08211859038553634

谁能解释一下？谢谢你。

python - 使用 LightGBM 时，不同的特征序列会导致不同的结果？

0 回答 0

Related

Reference