1

例如,假设我有以下情节:

library(ggplot2)
df = diamonds
dfs = df[sample(nrow(df), 100, replace = FALSE),]
ggplot(dfs, aes(x = carat)) +
    geom_bar(breaks = seq(0,2, by = 0.5), colour = 'white')

获得每个箱子的最常见cut、平均depth或中位数price(等)的最快/最优雅的方法是什么?

4

1 回答 1

2
df <- diamonds
set.seed(42)
dfs <- df[sample(nrow(df), 100, replace = FALSE),]

library(data.table)

DT <- as.data.table(dfs)
DT[,bins:=findInterval(carat,seq(0,2, by = 0.5))]
setkey(DT,bins)

#most common cut
DT[,names(which.max(table(cut))),by=bins]

#   bins      V1
#1:    1   Ideal
#2:    2 Premium
#3:    3   Ideal
#4:    4 Premium
#5:    5   Ideal

#note that there is a carat==2.01, which you did not plot
于 2013-03-15T12:12:16.240 回答