创建示例数据集
set.seed(23452)
##create 5 variables with 15 levels and 5 variables with 20 levels
nrowd <- 100
full <- data.frame(
replicate(5,letters[sample(sample(1:24,15),nrowd,replace=TRUE) ]),
replicate(5,LETTERS[sample(sample(1:24,20),nrowd,replace=TRUE) ])
)
###the following code represents a process that creates a dataframe with variables
###that have no more levels than full but may have fewer levels
scoring.set <- data.frame(sapply(full[sample(1:nrow(full),10),],as.character))
#factor levels are not the same
identical(sapply(full,levels),sapply(scoring.set,levels))
以下是您可以如何修复因子水平。
##make it so the levels of scoring.set variables have the levels of full
scoring.set2 <- data.frame(
mapply(scoring.set,lapply(full,levels), SIMPLIFY=FALSE,
FUN=function(scoring.var, full.level){
factor(scoring.var, levels=union(full.level,levels(scoring.var)))
})
)
变量仍然与以前相同,现在它们具有与 full 相同的级别
all(
mapply(scoring.set,scoring.set2, FUN=function(x,y){
identical(as.character(x),as.character(y))
})
)
identical(sapply(full,levels),sapply(scoring.set2,levels))
非因子变量的引入会使事情复杂化,但一般的想法是将子集化为仅因子变量factor.vars <- scoring.set[,sapply(scoring.set, is.factor)]
,然后做一些事情,比如data.frame(fixed.factor.vars, scoring.set[,!sapply(scoring.set,is.factor)])[,names(scoring.set)]
让所有东西以相同的顺序重新组合在一起。