我有两个应用函数来执行大型三维数组(437216,8,3)上前两个维度的平均值和标准偏差。在 Rx32 上完成需要 16 分钟。这是我们定期应用此脚本的数据库中许多大型数组中的第一个。关于如何加快运行时间的任何想法?
问问题
1229 次
3 回答
1
这似乎很慢。在我的机器上
set.seed(10)
x = array(rnorm(437216*8*3), dim = c(437216,8,3))
system.time(apply(x, 1, mean))
需要
user system elapsed
23.903 0.263 24.522
FWIW,
system.time(apply(x, 2, mean))
user system elapsed
0.546 0.274 0.841
system.time(apply(x, 3, mean))
user system elapsed
0.516 0.267 0.790
你的 sessionInfo() 是什么?
sessionInfo()
R version 2.11.1 (2010-05-31)
i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] cimis_0.1-3 RLastFM_0.1-4 RCurl_1.4-2 bitops_1.0-4.1 XML_3.1-0 lattice_0.18-8
loaded via a namespace (and not attached):
[1] grid_2.11.1 tools_2.11.1
于 2010-09-10T18:01:41.583 回答
0
我的 systemInfo() 如下:
sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-mingw32
locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] abind_1.1-0 RSQLite_0.9-1 DBI_0.2-5
apply 函数适用于第一个和第二个边距 (1:2) 并且系统时间低于,我认为这是导致它运行这么长时间的原因。我在更好的计算机/系统(上面列出)上运行它并减少了一些运行时间(下面),但它似乎仍然比它应该花费的时间更长:
> system.time(apply(x,1:2,mean))
user system elapsed
311.56 0.30 311.88
> system.time(apply(x,1:2,sd))
user system elapsed
505.92 0.21 506.81
我将考虑将其转换为 data.frame 并按照第二个建议将其取消列出。感谢所有的帮助!
于 2010-09-13T15:29:39.347 回答
0
编辑:在OP提供的代码之后,问题变得清晰起来。诀窍是将其转换为数据框:
> x = array(rnorm(437216*8*3), dim = c(437216,8,3))
> system.time(apply(x,1:2,mean))
user system elapsed
107.06 0.18 107.34
# This is run on a new quadcore i7, so it's not a slow machine...
> Tmp <- data.frame(V1=as.vector(x[,,1]),
+ V2=as.vector(x[,,2]),
+ V3= as.vector(x[,,3]))
> system.time({
+ Means <- rowMeans(Tmp)
+ Sd <- sqrt(rowSums((Tmp-Means)^2)/(3-1))
+ })
user system elapsed
6.72 0.40 7.12
要在正确的矩阵中得到结果:
Means <- matrix(Means,ncol=8)
Sd <- matrix(Sd,ncol=8)
概念证明:
x = array(rnorm(10*8*3), dim = c(10,8,3))
m1 <- apply(x,1:2,mean)
sd1 <- apply(x,1:2,sd)
Tmp <- data.frame(V1=as.vector(x[,,1]),
V2=as.vector(x[,,2]),
V3= as.vector(x[,,3]))
m2 <- rowMeans(Tmp)
sd2 <- sqrt(rowSums((Tmp-m2)^2)/2)
m2 <-matrix(m2,ncol=8)
sd2 <- matrix(sd2,ncol=8)
> all.equal(m1,m2)
[1] TRUE
> all.equal(sd1,sd2)
[1] TRUE
于 2010-09-10T16:11:35.373 回答