tree_method = 'exact'这真的意味着使用精确的xgboost贪心算法进行拆分查找吗?
我问这个问题是因为xgboost运行速度不合理。这是我用于运行测试的脚本
from xgboost import XGBRegressor as rr
import numpy as np
from sklearn.model_selection import train_test_split
import pickle
import sys
from time import time
t1 = time()
data = sys.argv[1]
with open(data, 'rb') as source:
data = pickle.load(source)
np.random.shuffle(data)
x = [item[0] for item in data]
y = [item[1] for item in data]
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size = 0.2, random_state = 100)
eval_set = [(x_train, y_train), (x_val, y_val)]
# Exact model
model_exact = rr(max_depth = 5,
n_estimators = 1,
slient = False,
min_child_weight = 0,
tree_method = 'exact')
model_exact.fit(x_train,
y_train,
eval_set=eval_set,
eval_metric="mae",
early_stopping_rounds=30)
t2 = time()
print(f"Time used: {t2 - t1}")
用于测试的腌制数据已上传至此处。
每个实例都有96特征,并且总共有11450实例。
xgboost0.9804270267486572s通过在单核 ( 1.3 GHz Intel Core i5)上运行找到第一个拆分。如果xgboost实际上对所有可能的拆分执行了贪婪搜索,则意味着只xgboost评估11450 x 96 = 1099200拆分0.9804270267486572s!
真的有xgboost这么厉害吗?或者,我误解了tree_method = exact吗?