设置objective='binary:logistic'
和objective='binary:logitraw'
xgboost 分类器之间有哪些区别?
根据文档(https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters),前者对应于logistic regression for binary classification, output probability
,而后者是logistic regression for binary classification, output score before logistic transformation
.
我不清楚这些在实践中意味着什么。你能解释一下在这两种情况的训练过程中哪些函数被最小化了吗?
此外,设置objective
参数似乎根本不会改变模型输出,如下面的代码所示。
模拟数据:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xgboost as xgb
x1 = np.random.uniform(low=-3,high=4,size=10000)
x2 = np.random.uniform(low=-3,high=4,size=10000)
x3 = np.random.uniform(low=-3,high=4,size=10000)
X = pd.DataFrame({'x1':x1, 'x2':x2, 'x3':x3})
z = 2 * x1 + 3 * x2 + 4 * x3
def invlogit(z):
p = 1 / (1 + np.exp(- z))
return p
pr = invlogit(z)
y = pd.Series(data=np.random.binomial(size=10000, n=1, p=pr))
定义两个分类器,所有参数都相同,除了objective
:
params={'gamma': 1.4,
'learning_rate': 0.2,
'max_delta_step': 5.,
'max_depth': 8,
'min_child_weight': 2.2,
'subsample': 0.7,
'objective':'binary:logistic',
'nthread':4,
'seed':2,
'num_boost_round':200,
'reg_alpha':0,
'reg_lambda':0
}
clf = xgb.XGBClassifier(**params)
clf.fit(X, y)
tmp1=params.copy()
tmp1['objective']='binary:logitraw'
clf1=xgb.XGBClassifier(**tmp1)
clf1.fit(X, y)
绘制预测(invlogit
是 logit 函数的倒数并给出概率)。
plt.plot(invlogit(clf.predict(X, output_margin=True)),
invlogit(clf1.predict(X, output_margin=True)),'.')
plt.xlabel('binary:logistic');
plt.ylabel('binary:logitraw');
我很困惑,两种情况的结果都是一样的。有什么猜测吗?谢谢!