我相信您将目标误认为是目标函数(obj 作为参数),xgboost 文档有时会很混乱。
简而言之,您只需要解决这个问题:
m = XGBClassifier(obj=brier, seed=42)
更深入一点,目标是 xgboost 如何在给定目标函数的情况下进行优化。通常 xgboost 从 y 向量中的类数推断优化。
我从源代码中截取了一个片段,正如您所看到的,只要您只有两个类,目标就设置为 binary:logistic:
class XGBClassifier(XGBModel, XGBClassifierBase):
def __init__(self, objective="binary:logistic", **kwargs):
super().__init__(objective=objective, **kwargs)
def fit(self, X, y, sample_weight=None, base_margin=None,
eval_set=None, eval_metric=None,
early_stopping_rounds=None, verbose=True, xgb_model=None,
sample_weight_eval_set=None, callbacks=None):
evals_result = {}
self.classes_ = np.unique(y)
self.n_classes_ = len(self.classes_)
xgb_options = self.get_xgb_params() # <-- obj function is set here
if callable(self.objective):
obj = _objective_decorator(self.objective) # <----- here is the mismatch of the names, if you pass objective as your brie func it will become "binary:logistic"
xgb_options["objective"] = "binary:logistic"
else:
obj = None
if self.n_classes_ > 2:
xgb_options['objective'] = 'multi:softprob' # <----- objective is being set here if n_classes> 2
xgb_options['num_class'] = self.n_classes_
+-- 35 lines: feval = eval_metric if callable(eval_metric) else None-----------------------------------------------------------------------------------------------------------------------------------------------------
self._Booster = train(xgb_options, train_dmatrix, # <----- objective is being passed in xgb_options dictionary
self.get_num_boosting_rounds(),
evals=evals,
early_stopping_rounds=early_stopping_rounds,
evals_result=evals_result, obj=obj, feval=feval, # <----- obj function is being passed to lower level api here
verbose_eval=verbose, xgb_model=xgb_model,
callbacks=callbacks)
+-- 12 lines: self.objective = xgb_options["objective"]------------------------------------------------------------------------------------------------------------------------------------------------------------------
return self
有一个固定的目标列表,您可以设置目标列表:
目标 [默认=reg:squarederror]
reg:squarederror: regression with squared loss.
reg:squaredlogerror: regression with squared log loss 12[(+1)−(+1)]2. All input labels are required to be greater than -1. Also, see metric rmsle for possible issue with this objective.
reg:logistic: logistic regression
binary:logistic: logistic regression for binary classification, output probability
binary:logitraw: logistic regression for binary classification, output score before logistic transformation
binary:hinge: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
count:poisson –poisson regression for count data, output mean of poisson distribution
max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)
survival:cox: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).
multi:softmax: set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
multi:softprob: same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata * nclass matrix. The result contains predicted probability of each data point belonging to each class.
rank:pairwise: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized
rank:ndcg: Use LambdaMART to perform list-wise ranking where Normalized Discounted Cumulative Gain (NDCG) is maximized
rank:map: Use LambdaMART to perform list-wise ranking where Mean Average Precision (MAP) is maximized
reg:gamma: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.
reg:tweedie: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.
只是确认目标不能是您的布里函数,在调用较低级别的api之前手动将目标设置为源代码中的布里函数
class XGBClassifier(XGBModel, XGBClassifierBase):
def __init__(self, objective="binary:logistic", **kwargs):
super().__init__(objective=objective, **kwargs)
def fit(self, X, y, sample_weight=None, base_margin=None,
eval_set=None, eval_metric=None,
early_stopping_rounds=None, verbose=True, xgb_model=None,
sample_weight_eval_set=None, callbacks=None):
+-- 54 lines: evals_result = {}--------------------------------------------------------------------
xgb_options["objective"] = xgb_options["obj"]
self._Booster = train(xgb_options, train_dmatrix,
self.get_num_boosting_rounds(),
evals=evals,
early_stopping_rounds=early_stopping_rounds,
evals_result=evals_result, obj=obj, feval=feval,
verbose_eval=verbose, xgb_model=xgb_model,
callbacks=callbacks)
+-- 14 lines: self.objective = xgb_options["objective"]--------------------------------------------
引发此错误:
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [10:09:53] /private/var/folders/z5/mchb9bz51cx3h97nkw9v0wkr0000gn/T/pip-install-kh801rm0/xgboost/xgboost/src/objective/objective.cc:26: Unknown objective function: `<function brier at 0x10b630d08>`
Objective candidate: binary:hinge
Objective candidate: multi:softmax
Objective candidate: multi:softprob
Objective candidate: rank:pairwise
Objective candidate: rank:ndcg
Objective candidate: rank:map
Objective candidate: reg:squarederror
Objective candidate: reg:squaredlogerror
Objective candidate: reg:logistic
Objective candidate: binary:logistic
Objective candidate: binary:logitraw
Objective candidate: reg:linear
Objective candidate: count:poisson
Objective candidate: survival:cox
Objective candidate: reg:gamma
Objective candidate: reg:tweedie