2

我有一个数据重塑问题,我可以使用一些帮助。

 ID          X1         X2         X3         X4         X5
6001 Certificate  Associate Bachelor's   Master's   Doctoral
5001 Certificate  Associate Bachelor's           
3311 Certificate  Associate Bachelor's           
1981 Certificate  Associate Bachelor's   Master's
4001   Associate Bachelor's   Master's           
2003   Associate Bachelor's   Master's   Doctoral
2017 Certificate  Associate                      
1001   Associate Bachelor's   Master's           
5002  Bachelor's

我需要把这些变成虚拟变量

  ID    Certificate     Associates      Bachelor         Master        Doctoral      
6001              1              1             1              1               1
5001              1              1             1              0               0 
2017              1              1             0              0               0

有什么建议么?

4

1 回答 1

2

试试这个reshape2包。我假设您的数据集被称为df

require(reshape2)
# First, melt your data, using 
m.df = melt(df, id.vars="ID")
# Then `cast` it
dcast(m.df, ID ~ value, length)
#     ID Var.2 Associate Bachelor's Certificate Doctoral Master's
# 1 1001     2         1          1           0        0        1
# 2 1981     1         1          1           1        0        1
# 3 2003     1         1          1           0        1        1
# 4 2017     3         1          0           1        0        0
# 5 3311     2         1          1           1        0        0
# 6 4001     2         1          1           0        0        1
# 7 5001     2         1          1           1        0        0
# 8 5002     4         0          1           0        0        0
# 9 6001     0         1          1           1        1        1

我还没有测试过,但是如果您对因素进行排序,它可能会控制输出列的顺序。

于 2012-07-13T19:26:11.893 回答