1

假设我有一个具有多个级别的因子变量,并且我试图将它们分成几组。

> levels(dat$years_continuously_insured_order2)
 [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"  
[19] "19"   "20" 

> levels(dat$age_of_oldest_driver)
 [1] "-16" "1"   "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
[22] "34"  "35"  "36"  "37"  "38"  "39"  "40

我有一个脚本,它遍历这些变量并将它们分组为几个类别。但是,每次我的脚本运行时,级别的数量可能(并且通常是)不同的。因此,如果我对变量进行分组的原始代码如下(见下文),那么如果在一小时后,我的脚本运行并且级别不同,它就没有用了。我现在可以有 25 个级别,而不是 15 个级别,并且值不同,但我仍然需要将它们分组到特定的类别中。

dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)

如何找到一种更优雅的方式将变量分组到段中?在 R 中有更好的方法来做到这一点吗?

谢谢!

4

2 回答 2

2

您可以将连续保险变量中的因子水平转换为数字,然后切入您的类别并重新因子()。第一步在 R-FAQ 中进行了描述(要正确执行,这是一个两步过程):

 dat$years_cont <-  factor( cut(  as.numeric(as.character( 
                                     dat$years_continuously_insured_order2)),
                                 breaks=c(0,2,3, Inf), right=FALSE  ),
                           labels=c( "1 or less", "2", "3 +")
                           )
#-----------------
> str(dat)
'data.frame':   100 obs. of  2 variables:
 $ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
 $ years_cont                       : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...
于 2012-09-10T17:35:24.540 回答
0

如果您的原始列是一个数字,请将其视为一个数字,而不是一个因素。做你正在做的事情的一个更简单的方法是:

bin.value = function(x) {
    ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}

dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))
于 2012-09-10T17:09:52.710 回答