r - 对具有多个级别的变量进行分组

Question

假设我有一个具有多个级别的因子变量，并且我试图将它们分成几组。

> levels(dat$years_continuously_insured_order2)
 [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"  
[19] "19"   "20" 

> levels(dat$age_of_oldest_driver)
 [1] "-16" "1"   "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
[22] "34"  "35"  "36"  "37"  "38"  "39"  "40

我有一个脚本，它遍历这些变量并将它们分组为几个类别。但是，每次我的脚本运行时，级别的数量可能（并且通常是）不同的。因此，如果我对变量进行分组的原始代码如下（见下文），那么如果在一小时后，我的脚本运行并且级别不同，它就没有用了。我现在可以有 25 个级别，而不是 15 个级别，并且值不同，但我仍然需要将它们分组到特定的类别中。

dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)

如何找到一种更优雅的方式将变量分组到段中？在 R 中有更好的方法来做到这一点吗？

谢谢！

score 2 · Accepted Answer

您可以将连续保险变量中的因子水平转换为数字，然后切入您的类别并重新因子（）。第一步在 R-FAQ 中进行了描述（要正确执行，这是一个两步过程）：

 dat$years_cont <-  factor( cut(  as.numeric(as.character( 
                                     dat$years_continuously_insured_order2)),
                                 breaks=c(0,2,3, Inf), right=FALSE  ),
                           labels=c( "1 or less", "2", "3 +")
                           )
#-----------------
> str(dat)
'data.frame':   100 obs. of  2 variables:
 $ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
 $ years_cont                       : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...

score 0 · Accepted Answer

如果您的原始列是一个数字，请将其视为一个数字，而不是一个因素。做你正在做的事情的一个更简单的方法是：

bin.value = function(x) {
    ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}

dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))

r - 对具有多个级别的变量进行分组

2 回答 2

Related

Reference