3

quantile(X, prob = seq(0, 1, length = 5), type = 5)

您如何将其转移到 data.table 操作中以添加一个新列,:=并为每个 ID 分配一个值,如果该值落在箱内,则分配一个适当的有序值,如 25%=1、50%=2 等每个身份证?

4

2 回答 2

4

你可以使用findInterval. 这将允许您使用quantile及其各种定义。

例如

findInterval(x, quantile(x,type=5), rightmost.closed=TRUE)

# It is fast
set.seed(1)
DT <- data.table(x=rnorm(1e6))

library(microbenchmark)


microbenchmark(
  order = DT[order(x),bin:=ceiling(.I/.N*5)],
  findInterval = DT[, b2 :=findInterval(x, quantile(x,type=5), rightmost.closed=TRUE)],times=10 )
## Unit: milliseconds
##         expr       min        lq    median       uq      max neval
##        order 551.31154 568.20324 573.36605 640.3255 655.5024    10
## findInterval  70.16782  79.11459  80.36363 140.2807 147.3080    10
于 2013-10-18T05:34:27.613 回答
2

对于没有关系的数据,一个简单的解决方案是手动拆分它......

set.seed(1)
DT <- data.table(x=rnorm(20))
DT[order(x),bin:=ceiling(.I/.N*5)]

导致

              x bin
 1: -0.62645381   1
 2:  0.18364332   3
 3: -0.83562861   1
 4:  1.59528080   5
 5:  0.32950777   3
 6: -0.82046838   1
 7:  0.48742905   3
 8:  0.73832471   4
 9:  0.57578135   4
10: -0.30538839   2
11:  1.51178117   5
12:  0.38984324   3
13: -0.62124058   2
14: -2.21469989   1
15:  1.12493092   5
16: -0.04493361   2
17: -0.01619026   2
18:  0.94383621   5
19:  0.82122120   4
20:  0.59390132   4
于 2013-10-18T05:26:58.047 回答