r - 无法使用 gbm 重现测试预测

Question

我正在使用gbm构建预测回归模型。我有训练和测试集（预定义的，不是随机选择的）。以下是代码的概述。

我有大约 600 行的训练数据和 150 行的测试数据。我知道他们很少，但仍然如此。

train <- ....
test <- ....

set.seed(123)
model <- gbm(target ~., data = train,
                distribution = "gaussian",
                n.trees = 4000,
                interaction.depth = 2,
                n.minobsinnode = 5,
                shrinkage = 0.01,
                bag.fraction = 1,
                train.fraction = .95,
                verbose = TRUE
            )

best_iter <- gbm.perf(model)

set.seed(123)
predictions <- predict(model, newdata = test, n.trees = best_iter)

set.seed(123)
predictions <- predict(model, newdata = train, n.trees = best_iter)

不知何故，当我用完全相同的参数一次又一次地运行 gbm 模型时，我无法在测试集上重现预测。但与此同时，我总是能够在训练集上重现预测。在建立模型和做出预测之前，我也在播种。有人可以帮我弄清楚发生了什么吗？请注意，训练和测试数据始终保持不变，我不会在每次运行时都更改它们。

score 0 · Accepted Answer

你有没有发现问题？我正在使用与您完全相同的模型方法，我可以看到我们代码中的唯一区别是您的预测。您可以尝试从每个训练和测试的新数据中删除因变量。另外，直接设置 n.trees ，不确定你现在得到的方式是什么。并将预测保存到两个单独的对象中。

PredEst <- predict(model, newdata = train[-which(names(train) %in% as.character("target"))], n.trees = 4000)

PredVal <- predict(model, newdata = test[-which(names(test) %in% as.character("target"))], n.trees = 4000)

r - 无法使用 gbm 重现测试预测

1 回答 1

Related

Reference