1

我有以下数据:

PassengerId Survived Pclass    Sex Age SibSp Parch    Fare Embarked
1           1        0      3   male  22     1     0  7.2500        S
2           2        1      1 female  38     1     0 71.2833        C
3           3        1      3 female  26     0     0  7.9250        S
4           4        1      1 female  35     1     0 53.1000        S
5           5        0      3   male  35     0     0  8.0500        S
6           6        0      3   male  NA     0     0  8.4583        Q

现在,当我使用dummyor时dummy.data.frame,我可以成功地将因子(here Sexand Embarked)转换为这样的假人:

PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare Embarked EmbarkedC EmbarkedQ EmbarkedS
1           1        0      3         0       1  22     1     0  7.2500        0         0         0         1
2           2        1      1         1       0  38     1     0 71.2833        0         1         0         0
3           3        1      3         1       0  26     0     0  7.9250        0         0         0         1
4           4        1      1         1       0  35     1     0 53.1000        0         0         0         1
5           5        0      3         0       1  35     0     0  8.0500        0         0         0         1
6           6        0      3         0       1  NA     0     0  8.4583        0         0         1         0

现在,如果我如何将它应用到Age它创建 100 多个假人的列上,一个用于每个唯一的年龄条目,一个用于NA. 我希望输出像

Age   Age.NA
22    0 
38    0
......
35    0
0     1

它会自动将缺失值视为不同的条目,并在出现因素时为其创建一个变量,但我希望在数值变量的情况下实现相同的目标,而不会妨碍列中已经存在的值。请帮忙。

4

2 回答 2

3

您可以使用:

df$Age.NA <- ifelse(is.na(df$Age), 1, 0)

进而:

library(dummies)
dummy.data.frame(df)

输出:

  PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare EmbarkedC EmbarkedQ EmbarkedS Age.NA
1           1        0      3         0       1  22     1     0  7.2500         0         0         1      0
2           2        1      1         1       0  38     1     0 71.2833         1         0         0      0
3           3        1      3         1       0  26     0     0  7.9250         0         0         1      0
4           4        1      1         1       0  35     1     0 53.1000         0         0         1      0
5           5        0      3         0       1  35     0     0  8.0500         0         0         1      0
6           6        0      3         0       1  NA     0     0  8.4583         0         1         0      1

数据:

df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L, 
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L, 
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"), 
    Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L, 
    1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25, 
    71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L, 
    1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"), 
    Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId", 
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", 
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")
于 2015-09-11T09:28:53.340 回答
0

使用ifelse()语句检查NA

Age.NA <- ifelse(is.na(Age), 1, 0)

于 2015-09-11T09:24:07.033 回答