我注意到在训练时在插入符号中使用公式和非公式方法会产生不同的结果。此外,公式法所用时间几乎是非公式法所用时间的 10 倍。这是预期的吗?
> z <- data.table(c1=sample(1:1000,1000, replace=T), c2=as.factor(sample(LETTERS, 1000, replace=T)))
# SYSTEM TIME WITH FORMULA METHOD
# -------------------------------
> system.time(r <- train(c1 ~ ., z, method="rf", importance=T))
user system elapsed
376.233 9.241 18.190
> r
1000 samples
1 predictors
No pre-processing
Resampling: Bootstrap (25 reps)
Summary of sample sizes: 1000, 1000, 1000, 1000, 1000, 1000, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared RMSE SD Rsquared SD
2 295 0.00114 4.94 0.00154
13 300 0.00113 5.15 0.00151
25 300 0.00111 5.16 0.00146
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 2.
# SYSTEM TIME WITH NON-FORMULA METHOD
# -------------------------------
> system.time(r <- train(z[,2,with=F], z$c1, method="rf", importance=T))
user system elapsed
34.984 2.977 2.708
Warning message:
In randomForest.default(trainX, trainY, mtry = tuneValue$.mtry, :
invalid mtry: reset to within valid range
> r
1000 samples
1 predictors
No pre-processing
Resampling: Bootstrap (25 reps)
Summary of sample sizes: 1000, 1000, 1000, 1000, 1000, 1000, ...
Resampling results
RMSE Rsquared RMSE SD Rsquared SD
297 0.00152 6.67 0.00197
Tuning parameter 'mtry' was held constant at a value of 2