例如,假设我有以下情节:
library(ggplot2)
df = diamonds
dfs = df[sample(nrow(df), 100, replace = FALSE),]
ggplot(dfs, aes(x = carat)) +
geom_bar(breaks = seq(0,2, by = 0.5), colour = 'white')
获得每个箱子的最常见cut
、平均depth
或中位数price
(等)的最快/最优雅的方法是什么?
df <- diamonds
set.seed(42)
dfs <- df[sample(nrow(df), 100, replace = FALSE),]
library(data.table)
DT <- as.data.table(dfs)
DT[,bins:=findInterval(carat,seq(0,2, by = 0.5))]
setkey(DT,bins)
#most common cut
DT[,names(which.max(table(cut))),by=bins]
# bins V1
#1: 1 Ideal
#2: 2 Premium
#3: 3 Ideal
#4: 4 Premium
#5: 5 Ideal
#note that there is a carat==2.01, which you did not plot