2

我以艰难的方式做到了这一点,并且想知道您将如何使用循环/更快的方法来做到这一点。我正在为级别创建标签以在剪切语句中使用,与年龄组一起工作。

levels(age_group) <- ("<10","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109",
             "110-119","120-129","130+")

有没有人对如何做到这一点有任何好主意?可以手动添加较少的“<10”和“130+”,但我确信有一种更快的方法来完成其余的工作。

谢谢

4

4 回答 4

3

最好使用 生成的级别cut,因为您当前的间隔没有定义包含哪一端。

s <- c(-Inf,seq(10,130,10),Inf)
levels(cut(s,s))
#  [1] "(-Inf,10]"  "(10,20]"    "(20,30]"    "(30,40]"    "(40,50]"   
#  [6] "(50,60]"    "(60,70]"    "(70,80]"    "(80,90]"    "(90,100]"  
# [11] "(100,110]"  "(110,120]"  "(120,130]"  "(130, Inf]"

如果您必须使用当前的间隔,您可以使用这个简单的函数:

strInterval <- function(start, end, by) {
  s <- seq(start, end, by)
  i <- paste(head(s,-1), s[-1]-1, sep="-")
  c(paste0("<",start), i, paste0(end,"+"))
}
strInterval(10,130,10)
#  [1] "<10"     "10-19"   "20-29"   "30-39"   "40-49"   "50-59"   "60-69"  
#  [8] "70-79"   "80-89"   "90-99"   "100-109" "110-119" "120-129" "130+" 
于 2013-08-14T20:56:08.897 回答
3
cts <- seq(10,130, by=10)
paste(c("<=", cts), c(cts-1, "+") , sep="-")
# [1] "<=-9"    "10-19"   "20-29"   "30-39"   "40-49"   "50-59"   "60-69"  
# [8] "70-79"   "80-89"   "90-99"   "100-109" "110-119" "120-129" "130-+"  

你说你可以根据需要调整末端,对吧?

于 2013-08-14T20:48:26.527 回答
2

只需插入最大/最小并运行其余代码。

min <- 10
max <- 130

seq1 <- seq(min, max, by = 10)
seq2 <- seq(min-1, max-1, by = 10)

age_group <- c(paste("<", min, sep = ""), rep("foo", length(seq1)-1))

  for (i in 1:(length(seq1)-1)) {

    grp1 <- seq1[i]
    grp2 <- seq2[i+1]

    group <- paste(grp1, "-", grp2, sep = "")

    age_group[i+1] <- group

  }

age_group <- c(age_group, paste(max, "+", sep = ""))

age_group
于 2013-08-14T20:43:43.223 回答
1

我的解决方案较早发布,但在此处进行了一些更改(仅当您使用该cut功能并使用该间隔时才适用):

mydata<-round(seq(1,20,length.out=5))
mydata<-as.data.frame(mydata)
names(mydata)<-"V" #name the column as V
mydata$V1<-cut(mydata$V,5) #break the data into five intervals and name that as col V1
mydata$lower<-with(mydata,round(as.numeric( sub("\\((.+),.*", "\\1", V1)))) #extract lower value
mydata$upper<-with(mydata,round(as.numeric( sub("[^,]*,([^]]*)\\]", "\\1",V1)))) # extract upper value
myfinaldata<-mydata[,c("lower","upper")] #create data frame of lower and upper values
myfinaldata$interval<-with(myfinaldata,paste(lower,upper,sep="-"))

 myfinaldata
  lower upper interval
1     1     5      1-5
2     5     9      5-9
3     9    12     9-12
4    12    16    12-16
5    16    20    16-20
于 2013-08-14T20:28:21.040 回答