0

我想先说我对使用 xgboost、pandas 和 numpy 还是很陌生。

目前我正在基于 kelly 标准为 XGBoost 实现自定义 OBJ 函数。这种方法取自 datascience.stackexchange 上的另一篇文章:https ://datascience.stackexchange.com/questions/16186/kelly-criterion-in-xgboost-loss-function

通过阅读 XGBoost 的文档,我需要返回梯度和粗麻布。( https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html ) 函数的梯度为:

坡度

函数的粗麻布为:

黑森州

在哪里:

b = 投注获得的赔率

p = 获胜的概率

x = 算法预测

为此,我将 p 视为二进制变量,1 或 0,以判断下注是否成功。

因此,p = 真实结果,1 或 0

使用我编写以下代码的文档,我还提供了一个小型示例数据集:

kell_train_data = np.array([0.08396877, 0.07131547, 0.17921676, 0.22317006, 0.06278754, 0.29874458, 0.08079682, 0.13074108, 0.06416036], 0.12209199, 0.10400956, 0.28764891, 0.2913481, 0.09450234, 0.07858831, 0.09246751, 0.17008012, 0.29026032, 0.2741014 , 0.05574227)

odds_train = np.array([0.149254, 0.108696, 0.312500, 0.217391, 0.061350, 0.208333, 0.178571, 0.065359, 0.037453, 0.107527, 0.256410, 0.400000, 0.370370,  0.085470, 0.058140, 0.204082, 0.476190, 0.294118, 0.121951, 0.033003])

y_train = np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]

kell_train_data = kell_train_data.reshape(kell_train_data.shape[0], -1)


def gradient(y_pred, y_true, odds = odds_train):
    "Compute gradient of betting function"
    
    
    return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))

def hessian(y_pred, y_true, odds = odds_train):
    "compute hessian of betting function"
    
    return (-(((odds**2)*y_true )/(odds*y_pred+1)**2)-((1-y_true)/((1-y_pred)**2)))

def kellyobjfunc(y_pred, y_true, odds = odds_train) :
    "kelly objective function for xgboost"
    grad = gradient(y_pred, y_true, odds)
    hess = hessian(y_pred, y_true, odds)
    return grad, hess

kell_mod = xgb.XGBClassifier(objective = kellyobjfunc, maximize = True)

kell_mod.fit(kell_train_data, y_train)

但是,当我运行上述代码时,出现以下错误:

Traceback (most recent call last):

  File "<ipython-input-623-18279e95b288>", line 1, in <module>
    kell_mod.fit( kell_target, y_train)

  File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\core.py", line 422, in inner_f
    return f(**kwargs)

  File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 919, in fit
    callbacks=callbacks)

  File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\training.py", line 214, in train
    early_stopping_rounds=early_stopping_rounds)

  File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\training.py", line 101, in _train_internal
    bst.update(dtrain, i, obj)

  File "C:\Users\USERR\Anaconda3\lib\site-packages\xgboost\core.py", line 1285, in update
    grad, hess = fobj(pred, dtrain)

  File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 49, in inner
    return func(labels, preds)

  File "<ipython-input-621-35f90873cb76>", line 14, in kellyobjfunc
    grad = gradient(y_pred, y_true, odds)

  File "<ipython-input-621-35f90873cb76>", line 5, in gradient
    return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))

TypeError: 'numpy.ndarray' object is not callable

我不确定是什么导致了这个问题。任何见解或帮助将不胜感激。

4

1 回答 1

0

所以我发现了错误。

在梯度函数中,括号的位置导致了错误。

def gradient(y_pred, y_true, odds = odds_train):
    "Compute gradient of betting function"
    
    
    return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))

实际上应该是:

def gradient(y_pred, y_true, odds = odds_train):
    "Compute gradient of betting function"
    
    
    return (((-(odds+1) * y_true +odds * y_pred+1)/((y_pred-1)*(odds*y_pred+1))))

此外,xgb 模型应该是:

kell_mod = xgb.XGBClassifier(obj = kellyobjfunc, maximize = True)

代码现在成功执行。

于 2021-05-28T15:00:39.790 回答