2

我想将计数小于 n 的所有因素组合成一个名为“Else”的因素

例如,如果 n = 3,那么在下面的 df 中,我想将“c”、“d”和“e”组合为“Else”:

df = data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e"))

我首先得到一个包含所有低计数值的 df:

library(plyr)
lowcounts = ddply(df, "y", function(z){if(nrow(z)<3) nrow(z) else NULL})

我知道我可以手动更改这些,但实际上我有几十个级别,所以我需要自动化。

我只想选择并重命名级别(df)中的级别 %in% lowcount,其余部分保持不变,但不知道如何继续。

4

2 回答 2

3

另一种选择:

#your dataframe
df = data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e"))

#which levels to keep and which to change
res <- table(df$y)
notkeep <- names(res[res < 3])
keep <- names(res)[!names(res) %in% notkeep]
names(keep) <- keep

#set new levels
levels(df$y) <- c(keep, list("else" = notkeep))

df
#    x    y
#1   1    a
#2   2    a
#3   3    a
#4   4    b
#5   5    b
#6   6    b
#7   7 else
#8   8 else
#9   9 else
#10 10 else
于 2013-11-11T12:23:52.103 回答
2

为什么不这样呢?

library(data.table)
dt <- data.table(df)
dt[,ynew := ifelse(.N < 3, "else",as.character(y)), by = "y"]
于 2013-11-11T11:32:29.250 回答