r - 在 R 中优化嵌套的 foreach dopar

Question

我想了解我的以下代码的结构。想知道它是否需要以不同的方式组织以更快地执行。具体来说，我是否需要在嵌套循环中以不同的方式使用foreach和dopar 。目前，内部循环是大部分工作（ddply 有 1-8 个细分变量，每个变量有 10-200 个级别），这就是我并行运行的。为简单起见，我省略了代码细节。

有任何想法吗？我的代码（如下所示）确实有效，但在 6 核、41gb 的机器上需要几个小时。数据集不是那么大（< 20k 条记录）。

for(m in 1:length(Predictors)){  # has up to three elements in the vector

  # construct the dataframe based on the specified predictor
  # subset the original dataframe based on the breakdown variables, outcome, predictor and covariates

  for(l in 1:nrow(pairwisematrixReduced)){  # this has 1-6 rows;subset based on correct comparison groups

    # some code here

    cl <- makeCluster(detectCores())  
    registerDoParallel(cl) 

    for (i in 1:nrow(subsetting_table)){  # this table has about 50 rows

      # this uses the columns specified by k in the glm; the prior columns will be used as breakdown variables
      # up to 10 covariates
      result[[length(result) + 1]] <- foreach(k = 11:17, .packages=c('plyr','reshape2', 'fastmatch')) %dopar% {   

        ddply( 
          df,
          b,   # vector of breakdown variables
          function(x) { 

           # run a GLM and manipulate the output

          ,.parallel = TRUE) # close ddply
      } # close k loop -- set of covariates
    } # close i loop -- subsetting table
  } #close l -- group combinations
} # close m loop - this is the pairwise predictor matrix 

stopCluster(cl)
result <- unlist(result, recursive = FALSE)
tmp2<-do.call(rbind.fill, result)

score 5 · Accepted Answer

复制出来的vignette("nested")

3 使用 %:% 和 %dopar%

在并行化嵌套的 for 循环时，总是存在要并行化哪个循环的问题。标准的建议是...

您还与和foreach %dopar%一起使用。使用六核处理器（并且可能是超线程）意味着该块将启动 12 个环境，然后将在每个环境中启动 12 个环境，用于 144 个并发环境。应该更改为与并行运行内部循环的问题文本一致。或者为了使其更清洁，将两者都更改为并用于一个循环和另一个循环。ddply.parallel=TRUEforeachddplyforeach%do%foreach%dopar%%:%

r - 在 R 中优化嵌套的 foreach dopar

1 回答 1

Related

Reference