我需要将分类变量转换为多个二分(“虚拟”)变量以用于逻辑回归。说我的数据框是:
tdf <- data.frame(first=sample(c("A", "B", "C", "D"), 100, replace=T),
lobe = sample(c("RUL", "RML", "RLL", "LUL", "LLL"), 100, replace=T),
continuous=sample(1:100, 100),
smoker = sample(c("never", "less20", "more20"), 100, replace=T)
)
我可以手动做
first. <- with (tdf, factor (first))
dummies <- model.matrix(~ first.)
dummies <- dummies[,-1]
tdf <- cbind(tdf, dummies)
请注意,将因素称为“第一”很重要。(或更一般地,“变量”),因为虚拟变量会将此前缀继承到它们各自的名称中,以便以后更容易识别它们('variable1.factor2','variable1.factor3'等)。
我的问题是:如何使用以编程方式分配变量名称的函数来做到这一点:
dummify <- function(df, vectorOfColIndices) {
cn <- colnames(df)
for (i in vectorOfColIndices) {
t. <- with (tdf, factor (df[i])) # temporary factor
assign (cn[i], t.) # give it the proper 'Variable.' name
dummies <- model.matrix(~ ????) # Stuck here: how do I call this newly created structure?
...
}
}
这样我以后就可以像这样转换数据框:
vd <- c(1,2,4) # columns that need to be converted into dummy vars
df <- dummify(df, vd)