这是一个使用(虽然没有特别要求,但它是对ordata.table
的明显补充或替代。除了代码有点长之外,重复调用效率低下,因为对于每个调用,您都将对数据进行排序aggregate
ddply
quantile
library(data.table)
Tukeys_five <- c("Min","Q1","Med","Q3","Max")
IRIS <- data.table(iris)
# this will create the wide data.table
lengthBySpecies <- IRIS[,as.list(fivenum(Sepal.Length)), by = Species]
# and you can rename the columns from V1, ..., V5 to something nicer
setnames(lengthBySpecies, paste0('V',1:5), Tukeys_five)
lengthBySpecies
Species Min Q1 Med Q3 Max
1: setosa 4.3 4.8 5.0 5.2 5.8
2: versicolor 4.9 5.6 5.9 6.3 7.0
3: virginica 4.9 6.2 6.5 6.9 7.9
或者,使用一次调用来quantile
使用适当的prob
参数。
IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25))), by = Species]
Species 0% 25% 50% 75% 100%
1: setosa 4.3 4.800 5.0 5.2 5.8
2: versicolor 4.9 5.600 5.9 6.3 7.0
3: virginica 4.9 6.225 6.5 6.9 7.9
请注意,创建的列的名称在语法上无效,尽管您可以使用类似的重命名setnames
编辑
有趣的是,quantile
如果您设置,将设置结果向量的名称names = TRUE
,这将复制(减慢数字运算并消耗内存 - 它甚至会在帮助中警告您,看中了!)
因此,您可能应该使用
IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE)), by = Species]
或者,如果您想返回命名列表,而不在R
内部复制
IRIS[,{quant <- as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE))
setattr(quant, 'names', Tukeys_five)
quant}, by = Species]