0

其中一个变量“Cabin”具有大量的 NA。我正在尝试使用决策树(rpart)来预测客舱不可用的乘客的客舱甲板。

目前,这是我的数据表的结构,它是训练集和测试集的 rbind。

 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Pclass     : Factor w/ 3 levels "1","2","3": 3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
 $ Sex        : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ...
 $ Age        : num  22 38 26 35 35 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : Factor w/ 929 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : Factor w/ 187 levels "","A10","A14",..: NA 83 NA 57 NA NA 131 NA NA NA ...
 $ Embarked   : Factor w/ 3 levels "C","Q","S": 3 1 3 3 3 2 3 3 3 1 ...
 $ Survived   : Factor w/ 2 levels "0","1": 1 2 2 2 1 1 1 1 2 2 ...
 $ FamilySize : num  2 2 1 2 1 1 1 5 3 2 ...
 $ FamilyID   : Factor w/ 8 levels "11","3","4","5",..: 8 8 8 8 8 8 8 4 2 8 ...
 $ FamilyID2  : Factor w/ 7 levels "11","4","5","6",..: 7 7 7 7 7 7 7 3 7 7 ...
 $ Title      : Factor w/ 11 levels "Col","Dr","Lady",..: 7 8 5 8 7 7 7 4 8 8 ...
 $ Surname    : chr  "Braund" "Cumings" "Heikkinen" "Futrelle" ...
 $ Cabin2     : Factor w/ 8 levels "A","B","C","D",..: NA 3 NA 3 NA NA 5 NA NA NA ...

请注意,我使用 strsplit 创建了“Cabin2”,它提取了“Cabin”变量的字母,据我了解,它对应于泰坦尼克号上的甲板。这大大减少了我与“小屋”战斗的关卡数量,从 187 个“小屋”减少到“小屋 2”的 8 个。

我正在尝试使用以下代码来预测机舱甲板:

cabinFit <- rpart(Cabin2 ~ Age + Sex + Fare + Embarked + SibSp + Parch + Title + FamilySize + FamilyID,

combi$Cabin2[is.na(combi$Cabin2)] <- predict(cabinFit,     combi[is.na(combi$Cabin2),])

我被 R 抛出的输出如下:

 Warning messages:
 1: In `[<-.factor`(`*tmp*`, is.na(combi$Cabin2), value = c(NA, 3L,   :
  invalid factor level, NA generated
 2: In `[<-.factor`(`*tmp*`, is.na(combi$Cabin2), value = c(NA, 3L,   :
  number of items to replace is not a multiple of replacement length

当我继续摆弄这些数据时,我拼命地试图弄明白这一点,但是我想知道为什么这段代码对我不起作用。

4

0 回答 0