catboost - 如何使用 catboost 过拟合检测器

Question

我试图了解 catboost 过拟合检测器。这里描述：

https://tech.yandex.com/catboost/doc/dg/concepts/overfitting-detector-docpage/#overfitting-detector

其他梯度提升包，如 lightgbm 和 xgboost 使用一个名为 early_stopping_rounds 的参数，这很容易理解（一旦验证错误在 early_stopping_round 步骤中没有减少，它就会停止训练）。

但是我很难理解 catboost 使用的 p_value 方法。谁能解释这个过拟合检测器是如何工作的以及它何时停止训练？

score 13 · Accepted Answer

Yandex 网站或 github 存储库中没有记录，但如果您仔细查看发布到 github（特别是此处）的 python 代码，您会看到通过在参数中设置“od_type”来激活过拟合检测器。回顾最近在 github 上的提交，catboost 开发人员最近还实现了一个类似于 lightGBM 和 xgboost 使用的“early_stopping_rounds”参数的工具，称为“Iter”。要设置在最近一次最佳迭代之后在停止前等待的轮数，请在“od_wait”参数中提供一个数值。

例如：

fit_param <- list(
  iterations = 500,
  thread_count = 10,
  loss_function = "Logloss",
  depth = 6,
  learning_rate = 0.03,
  od_type = "Iter",
  od_wait = 100
)

我正在使用带有 R 3.4.1 的 catboost 库。我发现在 fit_param 列表中设置“od_type”和“od_wait”参数非常适合我的目的。

我意识到这并没有回答您关于使用 p_value 方法的方式的问题，该方法也由 catboost 开发人员实现；不幸的是，我无法帮助你。希望其他人可以向我们俩解释该设置。

score 7 · Accepted Answer

Catboost 现在支持early_stopping_rounds：fit 方法参数

将过拟合检测器类型设置为 Iter 并在指定迭代次数后停止训练，因为迭代具有最佳度量值。

这与 xgboost 中的工作方式非常相似early_stopping_rounds。

这是一个例子：

from catboost import CatBoostRegressor, Pool

from sklearn.model_selection import train_test_split
import numpy as np 

y = np.random.normal(0, 1, 1000)
X = np.random.normal(0, 1, (1000, 1))
X[:, 0] += y * 2

X_train, X_eval, y_train, y_eval = train_test_split(X, y, test_size=0.1)

train_pool = Pool(X, y)
eval_pool = Pool(X_eval, y_eval)

model = CatBoostRegressor(iterations=1000, learning_rate=0.1)

model.fit(X, y, eval_set=eval_pool, early_stopping_rounds=10)

结果应该是这样的：

522:    learn: 0.3994718        test: 0.4294720 best: 0.4292901 (514)   total: 957ms    remaining: 873ms
523:    learn: 0.3994580        test: 0.4294614 best: 0.4292901 (514)   total: 958ms    remaining: 870ms
524:    learn: 0.3994495        test: 0.4294806 best: 0.4292901 (514)   total: 959ms    remaining: 867ms
Stopped by overfitting detector  (10 iterations wait)

bestTest = 0.4292900745
bestIteration = 514

Shrink model to first 515 iterations.

score 0 · Accepted Answer

early_stopping_rounds 考虑了 od_type='Iter' 和 od_wait 参数。无需单独设置 od_type 和 od_wait，只需设置 early_stopping_rounds 参数即可。

catboost - 如何使用 catboost 过拟合检测器

3 回答 3

Related

Reference