xgboost - LightGBM和XGBoost的梯度和hessian计算的数值稳定性

Question

我正在研究 LightGBM 和 XGBoost 分类的数值稳定性。我相信一个好的起点是梯度和 Hessian 的计算。这些需要计算逻辑函数，在我的理解中，它可能会在非常小的值下变得不稳定，因为这可能导致溢出。

以下是二元逻辑损失的 XGBoosts 实现。这里使用 epsilon 值来计算 Hessian，但仅用于 Hessian。为什么梯度或 sigmoid 函数不需要这个？为什么Hessian需要它

struct LogisticRegression {
template <typename T>
static T PredTransform(T x) { return common::Sigmoid(x); }
static bool CheckLabel(bst_float x) { return x >= 0.0f && x <= 1.0f; }
template <typename T>
static T FirstOrderGradient(T predt, T label) { return predt - label; }
template <typename T>
static T SecondOrderGradient(T predt, T label) {
  const T eps = T(1e-16f);
  return std::max(predt * (T(1.0f) - predt), eps);
}
static bst_float ProbToMargin(bst_float base_score) {
  CHECK(base_score > 0.0f && base_score < 1.0f)
      << "base_score must be in (0,1) for logistic loss";
  return -std::log(1.0f / base_score - 1.0f);
}
static const char* LabelErrorMsg() {
  return "label must be in [0,1] for logistic regression";
}
static const char* DefaultEvalMetric() { return "rmse"; }
};

// logistic loss for binary classification task.
struct LogisticClassification : public LogisticRegression {
  static const char* DefaultEvalMetric() { return "error"; }
};


inline float Sigmoid(float x) {
   return 1.0f / (1.0f + std::exp(-x));
}

sigmoid 函数链接：https ://github.com/dmlc/xgboost/blob/24f527a1c095b24115dc5d54ad35cc25d3bc3032/src/common/math.h 目标函数链接：https ://github.com/dmlc/xgboost/blob/master/src /objective/regression_obj.cc#L37

以下是 LightGBMs 对二元逻辑损失的 GetGradients 实现。据我所知，没有使用类似于 XGBoosts 实现的 epsilon 值。这会导致数值不稳定吗？

void GetGradients(const double* score, score_t* gradients, score_t* hessians) const override {
if (weights_ == nullptr) {
  #pragma omp parallel for schedule(static)
  for (data_size_t i = 0; i < num_data_; ++i) {
    // get label and label weights
    const int is_pos = is_pos_(label_[i]);
    const int label = label_val_[is_pos];
    const double label_weight = label_weights_[is_pos];
    // calculate gradients and hessians
    const double response = -label * sigmoid_ / (1.0f + std::exp(label * sigmoid_ * score[i]));
    const double abs_response = fabs(response);
    gradients[i] = static_cast<score_t>(response * label_weight);
    hessians[i] = static_cast<score_t>(abs_response * (sigmoid_ - abs_response) * label_weight);
  }

链接到二进制逻辑损失类https://github.com/Microsoft/LightGBM/blob/1c92e75d0342989359c469b1ffabc2901038c0f2/src/objective/binary_objective.hpp

我希望有人能帮我解释这些问题，因为这让我很难过。如果可能发生数值不稳定性，实际示例会触发它吗？

非常感谢您提前。

score 1 · Accepted Answer

LightGBM 使用另一种方法来解决数值稳定性问题。

LightGBM 会限制叶子的最小值/最大值：https ://github.com/Microsoft/LightGBM/blob/master/include/LightGBM/tree.h#L14
并且在计算叶子输出的时候会加一个epsilon：参考：https ://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L76和https://github.com/ Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L328

sum_hessians 将始终大于 epsilon。

xgboost - LightGBM和XGBoost的梯度和hessian计算的数值稳定性

1 回答 1

Related

Reference