2

I am trying to run the prediction function I got after training my model and after cross validation. I am predicting the variable "classe."

The test data has the same name number of predictors as the training data, except it has fewer rows (20 observations). All of the predictors in the test data are numeric (just like training data). But it seems like it's causing problems no matter what models I used.

Model:

rf <- train(train$classe ~., method="rf", data=train, 
        trControl = trainControl(method = "oob"))

I tried:

predict(rf, testing1)

I got

Error in predict.randomForest(modelFit, newdata) : newdata has 0 rows 

then I tried

gbm <- train(train$classe ~., method="gbm", data=train, 
         trControl = trainControl(method = "cv", number=5))

predict(gbm, testing1)

I got

Error in aperm.default(psum, c(2, 1, 3)) : 
'perm' is of wrong length 3 (!= 2) 

My test data looks like this, the only difference is the last variable indicates a "problem id", whereas in the training set the last variable indicates "classe":

> str(testing1)
'data.frame':   20 obs. of  86 variables:
 $ roll_belt              : num  123 1.02 0.87 125 1.35 -5.92 1.2 0.43 0.93 114 ...
 $ pitch_belt             : num  27 4.87 1.82 -41.6 3.33 1.59 4.44 4.15 6.72 22.4 ...
 $ total_accel_belt       : num  20 4 5 17 3 4 4 4 4 18 ...
 $ kurtosis_roll_belt     : num  NA NA NA NA NA NA NA NA NA NA ...
 $ kurtosis_picth_belt    : num  NA NA NA NA NA NA NA NA NA NA ...

 ... # all numeric variables 

 $ magnet_forearm_y       : num  419 791 698 783 -787 800 284 -619 652 723 ...
 $ magnet_forearm_z       : num  617 873 783 521 91 884 585 -32 469 512 ...
 $ problem_id             : num  1 2 3 4 5 6 7 8 9 10 ...

Any help is appreciated!!

4

2 回答 2

2

我解决了这个问题——测试数据列中的一些值是“NA”,而在训练数据中它们是空白的。将两个文件读入 R 的方式之间存在一些不一致。修复该问题后,predict() 现在可以工作了。

于 2014-10-24T21:17:35.543 回答
0

对于预测,我知道列名必须完全相同。如果连最后一个都关闭了,那么这可能会导致问题。

于 2014-10-24T18:43:26.383 回答