我很困惑。我想通过使用mclapply:parallel来加速我的算法,但是当我比较时间效率时, apply 仍然获胜。
我正在通过函数 quantsm 调用的rq.fit.fnb:quantreg对 log2ratio 数据进行平滑处理,并将数据包装到矩阵/列表中以供应用/lapply(mclapply)使用。
我像这样调整我的数据:
q = matrix(data, ncol=N) # wrapping into matrix (using N = 2, 4, 6 or 8)
ql = as.list(as.data.frame(q)) # making list
和时间比较:
apply=system.time(apply(q, 1, FUN=quantsm, 0.50, 2))
lapply=system.time(lapply(ql, FUN=quantsm, 0.50, 2))
mc2lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=2))
mc4lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=4))
mc6lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=6))
mc8lapply=system.time(mclapply(ql, FUN=quantsm, 0.50, 2, mc.cores=8))
timing=rbind(apply,lapply,mc2lapply,mc4lapply,mc6lapply,mc8lapply)
函数quantsm:
quantsm <- function (y, p = 0.5, lambda) {
# Quantile smoothing
# Input: response y, quantile level p (0<p<1), smoothing parmeter lambda
# Result: quantile curve
# Augment the data for the difference penalty
m <- length(y)
E <- diag(m);
Dmat <- diff(E);
X <- rbind(E, lambda * Dmat)
u <- c(y, rep(0, m - 1))
# Call quantile regression
q <- rq.fit.fnb(X, u, tau = p)
q
}
函数rq.fit.fnb(quantreg 库):
rq.fit.fnb <- function (x, y, tau = 0.5, beta = 0.99995, eps = 1e-06)
{
n <- length(y)
p <- ncol(x)
if (n != nrow(x))
stop("x and y don't match n")
if (tau < eps || tau > 1 - eps)
stop("No parametric Frisch-Newton method. Set tau in (0,1)")
rhs <- (1 - tau) * apply(x, 2, sum)
d <- rep(1, n)
u <- rep(1, n)
wn <- rep(0, 10 * n)
wn[1:n] <- (1 - tau)
z <- .Fortran("rqfnb", as.integer(n), as.integer(p), a = as.double(t(as.matrix(x))),
c = as.double(-y), rhs = as.double(rhs), d = as.double(d),
as.double(u), beta = as.double(beta), eps = as.double(eps),
wn = as.double(wn), wp = double((p + 3) * p), it.count = integer(3),
info = integer(1), PACKAGE = "quantreg")
coefficients <- -z$wp[1:p]
names(coefficients) <- dimnames(x)[[2]]
residuals <- y - x %*% coefficients
list(coefficients = coefficients, tau = tau, residuals = residuals)
}
对于长度为 2000 的数据向量,我得到:
(值 = 以秒为单位的经过时间;列 = 平滑矩阵/列表的不同列数)
2cols 4cols 6cols 8cols
apply 0.178 0.096 0.069 0.056
lapply 16.555 4.299 1.785 0.972
mc2lapply 11.192 2.089 0.927 0.545
mc4lapply 10.649 1.326 0.694 0.396
mc6lapply 11.271 1.384 0.528 0.320
mc8lapply 10.133 1.390 0.560 0.260
对于长度为 4000 的数据,我得到:
2cols 4cols 6cols 8cols
apply 0.351 0.187 0.137 0.110
lapply 189.339 32.654 14.544 8.674
mc2lapply 186.047 20.791 7.261 4.231
mc4lapply 185.382 30.286 5.767 2.397
mc6lapply 184.048 30.170 8.059 2.865
mc8lapply 182.611 37.617 7.408 2.842
为什么 apply 比 mclapply 高效得多?也许我只是在做一些常见的初学者错误。
谢谢你的反应。