0

我有一个多变量数据框,想将里面的分类数据转换为虚拟变量,我使用了 model.matrix,但它不太管用。请参考以下示例:

age = c(1:15)                                                          #numeric
sex = c(rep(0,7),rep(1,8)); sex = as.factor(sex)                       #factor
bloodtype = c(rep('A',2),rep('B',8),rep('O',1),rep('AB',4));bloodtype = as.factor(bloodtype)         #factor
bodyweight = c(11:25)                                                  #numeric

wholedata = data.frame(cbind(age,sex,bloodtype,bodyweight))

model.matrix(~.,data=wholedata)[,-1]

我没有使用的原因model.matrix(~age+sex+bloodtype+bodyweight)[,-1]是因为这只是一个玩具示例。在真实数据中,我可以有数十或数百列。我不认为在这里输入所有变量名是个好主意。

谢谢

4

1 回答 1

1

It's the cbind that's messing things up. It converts your factors to numerics which are then not interpreted correctly by model.matrix.

If you just do wholedata = data.frame(age,sex,bloodtype,bodyweight) there should be no problem.

cbind returns a matrix and in a matrix everything must have the same type. The result in this example is that the factors are converted to integers (which is the underlying representation of a factor in the first place) and then the type of the matrix is integer.

Try

wholedata = cbind(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## TRUE
is.factor(wholedata[,2]) ## FALSE

wholedata = data.frame(age,sex,bloodtype,bodyweight)
is.integer(wholedata) ## FALSE
is.factor(wholedata[,2]) ## TRUE
于 2014-08-20T19:40:05.797 回答