r - 如何为具有并行后端的函数编写 R 包文档

Question

我想把这个函数写成一个R包

编辑

#' create suns package
#''
#' More detailed Description
#'
#' @describeIn This sums helps to
#'
#' @importFrom foreach foreach
#'
#' @importFrom doParallel registerDoParallel
#'
#' @param x Numeric Vector
#'
#' @importFrom doParallel `%dopar%`
#'
#' @importFrom parallel parallel
#'
#' @export
sums <- function(x){
plan(multisession)
n_cores <- detectCores()# check for howmany cores present in the Operating System
cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
doParallel::registerDoParallel(cores  =  detectCores())

    ss <- function(x){
  `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i}
     }
    sss <- function(x){
   `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i^2}
}

ssq <- function(x){
   `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i^3}
}

sums <- function(x, methods = c("sum", "squaredsum", "cubedsum")){

  output <- c()

  if("sum" %in% methods){
    output <- c(output, ss = ss(x))
  }

  if("squaredsum" %in% methods){
    output <- c(output, sss = sss(x))
  }

  if("cubedsum" %in% methods){
    output <- c(output, ssq = ssq(x))
  }

  return(output)
}

parallel::stopCluster(cl = cl)
x <- 1:10

sums(x)

.

我需要的

假设我的向量x是如此之大，以至于它将需要一个串行处理5 hours来完成任务x <- 1:9e9，例如并行处理可以提供帮助的地方。我如何包括：

n_cores <- detectCores()
#cl <- makeCluster(n_cores)
#registerDoParallel(cores  =  detectCores())

在我的.R文件和DESCRIPTION文件中，以便它值得R包装文档？

score 1 · Accepted Answer

即使不是很容易看出问题的范围，我也会尝试提出相关建议。我了解您在使用并行计算的示例/测试对您的包进行检查时遇到问题。

首先，请记住 check 使用 CRAN 标准，并且出于兼容性原因，不可能在 CRAN 包中运行使用超过 2 个内核的示例或测试。因此，您的示例必须足够简单，才能由 2 个内核处理。
然后您的代码中存在问题，因为您创建了一个集群，但不要在 doParallel 中使用它
接下来，您将在代码中使用并行包和 doParallel 包，因此它们必须包含在控制台中运行的说明文件中：

usethis::use_package("parallel")
usethis::use_package("doParallel")

这将在描述的“导入”部分添加这两个包。然后你不会在你的包中明确加载这些库。

然后，您还应该在相关包的名称之后使用“::”来阐明您的示例中的功能，这将使您的示例看起来像：

    n_cores <- 2
    cl <- parallel::makeCluster(n_cores)
    doParallel::registerDoParallel(cl = cl)
    ...
    parallel::stopCluster(cl = cl)

您也可以参考 registerDoParallel 文档获得类似的一段代码，您也会发现它仅限于 2 个内核。

完整地说，我不认为你真的需要 foreach 包，因为 R 中的默认并行化非常强大。如果您希望能够将您的函数与一起使用detectCores，我建议您添加一个 limitint 参数。这个函数应该以更“R like”的方式做你想做的事：

sums <- function(x, methods, maxcores) {
  n_cores <- min(maxcores,
                 parallel::detectCores())# check for howmany cores present in the Operating System
  cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
  
  outputs <- sapply(
    X = methods,
    FUN = function(method) {
      if ("sum" == method) {
        output <- parallel::parSapply(
          cl = cl,
          X = x,
          FUN = function(i)
            i
        )
      }
      
      if ("squaredsum" == method) {
        output <-
          parallel::parSapply(
            cl = cl,
            X = x,
            FUN = function(i)
              i ** 2
          )
      }
      
      if ("cubedsum" == method) {
        output <-
          parallel::parSapply(
            cl = cl,
            X = x,
            FUN = function(i)
              i ** 3
          )
      }
      
      return(sum(output))
    }
  )
  
  parallel::stopCluster(cl = cl)
  
  return(outputs)
}


x <- 1:10000000

sums(x = x, c("sum", "squaredsum"), 2)

r - 如何为具有并行后端的函数编写 R 包文档

1 回答 1

Related

Reference