r - xgboost 覆盖率是如何计算的？

Question

有人可以解释R 包中的Cover列是如何在函数中计算的吗？xgboostxgb.model.dt.tree

在文档中，它说 Cover “是衡量受拆分影响的观察数量的指标”。

xgboost当您运行此函数的文档中给出的以下代码时，Cover树 0 的节点 0 为 1628.2500。

data(agaricus.train, package='xgboost')

#Both dataset are list with two items, a sparse matrix and labels
#(labels = outcome column which will be learned).
#Each column of the sparse Matrix is a feature in one hot encoding format.
train <- agaricus.train

bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)

火车数据集中有 6513 个观测值，所以谁能解释为什么Cover树 0 的节点 0 是这个数字的四分之一（1628.25）？

此外，Cover对于树 1 的节点 1 是 788.852 - 这个数字是如何计算的？

任何帮助将非常感激。谢谢。

score 27 · Accepted Answer

封面定义xgboost为：

分类到叶子的训练数据的二阶梯度之和，如果是平方损失，这简单地对应于该分支中的实例数。节点在树中越深，该指标越低

https://github.com/dmlc/xgboost/blob/f5659e17d5200bd7471a2e735177a81cb8d3012b/R-package/man/xgb.plot.tree.Rd 没有特别好的记录......

为了计算覆盖，我们需要知道树中那个点的预测，以及损失函数的二阶导数。

对我们来说幸运的是，您示例中 0-0 节点中每个数据点（其中 6513 个）的预测值为 0.5。这是一个全局默认设置，您在 t=0 时的第一个预测是 0.5。

base_score [ default=0.5 ] 所有实例的初始预测分数，全局偏差

http://xgboost.readthedocs.org/en/latest/parameter.html

二元逻辑的梯度（这是您的目标函数）是 py，其中 p = 您的预测，y = 真实标签。

因此，粗麻布（我们需要它）是 p*(1-p)。 注意：可以在没有 y（真实标签）的情况下确定 Hessian。

所以（把它带回家）：

6513 * (.5) * (1 - .5) = 1628.25

在第二棵树中，该点的预测不再都是 0.5，sp 让我们得到一棵树之后的预测

p = predict(bst,newdata = train$data, ntree=1)

head(p)
[1] 0.8471184 0.1544077 0.1544077 0.8471184 0.1255700 0.1544077

sum(p*(1-p))  # sum of the hessians in that node,(root node has all data)
[1] 788.8521

请注意，对于线性（平方误差）回归，hessian 始终为 1，因此封面表示该叶子中有多少示例。

最大的收获是覆盖是由目标函数的粗麻布定义的。关于获取梯度和二元逻辑函数的 hessian 的大量信息。

这些幻灯片有助于了解他为什么使用粗麻布作为权重，并解释了xgboost分裂与标准树的不同之处。 https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf

r - xgboost 覆盖率是如何计算的？

1 回答 1

Related

Reference