我正在执行 K-Folds 交叉验证来评估我的 SVM 模型性能。但是,由于数据的性质,我想使用特征缩放来缩放我的数据。这是数据的片段;
# IMPORTING THE DATASET
dataset <- read.csv("imported dataset.csv")
# ENCODING THE DEPENDENT VARIABLE AS A FACTOR
dataset$Purchased <- factor(dataset$Purchased, levels = c(0, 1))
# DATASET
Age EstimatedSalary Purchased
1 19 19000 0
2 35 20000 0
3 26 43000 0
4 27 57000 0
5 19 76000 0
6 27 58000 0
这是其余的代码;
# TRAIN TEST SPLIT
split = sample.split(dataset$Purchased, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
# K-FOLD CV WITH FEATURE SCALING
trCtrl <- trainControl(method = "repeatedcv",
number = 10, #10-fold CV
repeats = 10,
savePredictions = TRUE)
model <- train(Purchased ~ .,
data=train_set,
method="svmRadial",
trControl = trCtrl,
preProcess = c("center","scale"))
我知道特征缩放然后在原始训练集上运行 K-folds CV 会导致数据泄漏,因为内部训练集和验证集都被缩放在一起,从而导致过度拟合。
我想知道caret包中的preProcess函数是否以一种避免这种情况的方式缩放数据并分别缩放内部训练集和验证集?