下面的代码对训练集和测试集进行缩放。由于 Col6 和 Col7 不得缩放,因此将它们从原始数据中删除以缩放训练集和测试集:
library(tidyverse)
Data_Frame <- data.frame(Col1 = c("A1", "A1", "A1", "A2", "A2", "A2", "A3", "A3", "A3"),
Col2 = c("2011-03-11", "2014-08-21", "2016-01-17", "2017-06-30", "2018-07-11", "2018-11-28", "2019-09-04", "2020-02-29", "2020-07-12"),
Col3 = c("2018-10-22", "2019-05-24", "2020-12-25", "2018-10-12", "2019-09-24", "2020-12-19", "2018-10-22", "2019-06-14", "2020-12-20"),
Col4 = c(4, 12, 2, 1, 4, 4, 75, 4, 44),
Col5 = c(7.81, 6.45, 3, 1, 3, 2, 5, 1, 2),
Col6 = c(1, 1, 1, 1, 1, 1, 1, 1, 1),
Col7 = c(2, 2, 2, 2, 2, 2, 2, 2, 2),
Col8 = c(7.77, 6, 8.4, -11.23, 3.5, 7.2, 15, 100, 22.22))
# randomly split data in r
sample_size = floor(0.8*nrow(Data_Frame))
set.seed(777)
picked = sample(seq_len(nrow(Data_Frame)),size = sample_size)
Train_Set = Data_Frame[picked,]
Test_Set = Data_Frame[-picked,]
# Remove columns Col6 and Col7, which will not be scaled
Train <- Train_Set %>% dplyr::select(- c(Col6, Col7))
Test <- Test_Set %>% dplyr::select(- c(Col6, Col7))
# Scale Train, collect mean and sd to scale in Test
Train_Scale <- Train %>% dplyr::mutate_if(is.numeric, ~scale(.) %>% as.vector)
num_cols <- names(which(sapply(Train,is.numeric)))
scale_params <- attributes(scale(Train[,num_cols]))[c("scaled:center","scaled:scale")]
# Scale Test with the scales of Train
Test_Scale <- Test
Test_Scale[,num_cols] = scale(Test_Scale[,num_cols],center=scale_params[[1]],scale=scale_params[[2]])
试
varnames <- c('Col6', 'Col7')
index <- names(Train_Set) %in% varnames
Train_Scale_Check <- Train_Set[, !index] %>% dplyr::mutate_if(is.numeric, ~scale(.) %>% as.vector)
工作,但从数据框中删除 Col6 和 Col7。
和,
Train_Scale_Check <- Train_Set %>% dplyr::mutate_if(is.numeric, !index, ~scale(.) %>% as.vector)
引发以下错误:
Error: expecting a one sided formula, a function, or a function name.
Run `rlang::last_error()` to see where the error occurred.
rlang::last_error()
<error/rlang_error>
expecting a one sided formula, a function, or a function name.
Backtrace:
1. dplyr::mutate_if(...)
2. dplyr:::manip_if(...)
3. dplyr:::as_fun_list(.funs, .env, ..., .caller = .caller)
4. dplyr:::map(...)
5. base::lapply(.x, .f, ...)
6. dplyr:::FUN(X[[i]], ...)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
expecting a one sided formula, a function, or a function name.
Backtrace:
x
1. \-dplyr::mutate_if(...)
2. \-dplyr:::manip_if(...)
3. \-dplyr:::as_fun_list(.funs, .env, ..., .caller = .caller)
4. \-dplyr:::map(...)
5. \-base::lapply(.x, .f, ...)
6. \-dplyr:::FUN(X[[i]], ...)
有没有一种简单的方法可以在 Train_Set 和 Test_Set 数据集中保留 Col6 和 Col7,但不能对其进行缩放?将列 Col6 和 Col7 提取为单独的数据帧的冗长方法,使用顶部的代码并最终 cbind Col6 和 Col7 数据帧。