与数据帧相比,tapply
使用类似操作的速度提高了多少,这给我留下了深刻的印象。data.table
例如:
df = data.frame(class = round(runif(1e6,1,1000)), x=rnorm(1e6))
DT = data.table(df)
# takes ages if somefun is complex
res1 = tapply(df$x, df$class, somefun)
# takes much faster
setkey(DT, class)
res2 = DT[,somefun(x),by=class]
apply
但是,在类似操作(即,需要将函数应用于每一行的情况)中,我并没有设法让它工作得比数据帧快得多。
df = data.frame(x1 = rnorm(1e6), x2=rnorm(1e6))
DT = data.table(df)
# takes ages if somefun is complex
res1 = apply(df, 1, somefun)
# not much improvement, if at all
DT[,rowid:=.I] # or: DT$rowid = 1:nrow(DT)
setkey(DT, rowid)
res2 = DT[,somefun1(x1,x2),by=rowid]
这真的只是意料之中还是有一些技巧?