python - xgboost 中的`binary:logitraw` 与`binary:logistic` 目标

Question

设置objective='binary:logistic'和objective='binary:logitraw'xgboost 分类器之间有哪些区别？

根据文档（https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters），前者对应于logistic regression for binary classification, output probability，而后者是logistic regression for binary classification, output score before logistic transformation.

我不清楚这些在实践中意味着什么。你能解释一下在这两种情况的训练过程中哪些函数被最小化了吗？

此外，设置objective参数似乎根本不会改变模型输出，如下面的代码所示。

模拟数据：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xgboost as xgb

x1 = np.random.uniform(low=-3,high=4,size=10000) 
x2 = np.random.uniform(low=-3,high=4,size=10000)
x3 = np.random.uniform(low=-3,high=4,size=10000)
X = pd.DataFrame({'x1':x1, 'x2':x2, 'x3':x3})
z =  2 * x1 + 3 * x2 + 4 * x3

def invlogit(z):
    p = 1 / (1 + np.exp(- z)) 
    return p

pr = invlogit(z)
y = pd.Series(data=np.random.binomial(size=10000, n=1, p=pr))

定义两个分类器，所有参数都相同，除了objective：

 params={'gamma': 1.4,
 'learning_rate': 0.2,
 'max_delta_step': 5.,
 'max_depth': 8,
 'min_child_weight': 2.2,
 'subsample': 0.7,
 'objective':'binary:logistic',
 'nthread':4,
 'seed':2,
 'num_boost_round':200,
 'reg_alpha':0,
 'reg_lambda':0
}
clf = xgb.XGBClassifier(**params)
clf.fit(X, y)
tmp1=params.copy()
tmp1['objective']='binary:logitraw'
clf1=xgb.XGBClassifier(**tmp1)
clf1.fit(X, y)

绘制预测（invlogit是 logit 函数的倒数并给出概率）。

plt.plot(invlogit(clf.predict(X, output_margin=True)),
         invlogit(clf1.predict(X, output_margin=True)),'.')
plt.xlabel('binary:logistic');
plt.ylabel('binary:logitraw');

我很困惑，两种情况的结果都是一样的。有什么猜测吗？谢谢！

python - xgboost 中的`binary:logitraw` 与`binary:logistic` 目标

0 回答 0

Related

Reference