0

我在训练集中有 303 个数据点(见图)。其中许多点在 Y 轴上等于 0。 在此处输入图像描述

现在我想训练 GBM 模型来预测 Y 值。这是我的模型:

train.subset<- data.frame(yval=train$yval,
                               hour=train$hour,
                               daymoment=train$daymoment,
                               year=train$year,
                               log.windspeed=log(train$windspeed+1),
                               weather=train$weather,
                               workingday=train$workingday,
                               log.temp=log(train$temp+1),
                               log.atemp=log(train$atemp+1),
                               log.humidity=log(train$humidity+1))

inTrain <- caret::createDataPartition(train.subset$registered, 
                                      p = .85, list = FALSE)
train.registered <- train.subset[inTrain, ]

cv.registered <- train.subset[-inTrain, ]

fitControl <- trainControl(## 5-fold CV
                method = "repeatedcv",
                number = 10,
                ## repeated ten times
                repeats = 10)

gbmGrid <-  expand.grid(interaction.depth = c(1, 5, 9),
                        n.trees = (5:25)*50,
                        shrinkage = 0.1)

fit.registered <- train(registered ~., data=train.registered, method = "gbm",trControl = fitControl,verbose = FALSE,tuneGrid = gbmGrid)

prediction.registered<-predict(fit.registered, newdata = cv.registered)
prediction.registered[prediction.registered<0] <- min(prediction.registered[prediction.registered > 0])

RMSE <- sqrt(mean((prediction.registered - cv.registered$registered)^2))
RMSE

然后我得到相当高的 RMSE 值:~28。

这是显示yval交叉验证集的预测和实际的图。

在此处输入图像描述

我不明白为什么这条相对简单的曲线会有这么大的误差。任何想法?也许我应该使用找到的调整参数尝试另一个包caret

以防万一此信息有帮助:

> summary(fit.registered)

                        var   rel.inf
hour                   hour 23.385420
log.atemp         log.atemp 12.959972
daymoment.C     daymoment.C 11.605700
log.humidity   log.humidity 10.972162
log.windspeed log.windspeed  9.627754
daymoment.L     daymoment.L  7.517074
daymoment^4     daymoment^4  4.658695
log.temp           log.temp  4.567798
workingday       workingday  4.135300
daymoment.Q     daymoment.Q  3.766462
year                   year  3.763452
weather             weather  3.040211

更新:

动车组

测试集

4

0 回答 0