r - expand.grid 和mapply的组合？

Question

我正在尝试提出一个变体mapply（暂时调用它），它结合了和xapply的功能（某种）。也就是说，对于一个函数和一个未知长度的参数列表, , , ... ，它应该生成一个长度列表（其中是列表的长度），这是应用于列表。expand.gridmapplyFUNL1L2L3n1*n2*n3niiFUN

如果expand.grid努力生成列表列表而不是数据框，则可能可以使用它，但我记住，列表可能是不一定适合数据框的事物列表。

如果正好要扩展三个列表，则此功能可以正常工作，但我对更通用的解决方案感到好奇。（FLATTEN未使用，但我可以想象这FLATTEN=FALSE会生成嵌套列表而不是单个列表......）

xapply3 <- function(FUN,L1,L2,L3,FLATTEN=TRUE,MoreArgs=NULL) {
  retlist <- list()
  count <- 1
  for (i in seq_along(L1)) {
    for (j in seq_along(L2)) {
      for (k in seq_along(L3)) {
        retlist[[count]] <- do.call(FUN,c(list(L1[[i]],L2[[j]],L3[[k]]),MoreArgs))
        count <- count+1
      }
    }
  }
  retlist
}

编辑：忘记返回结果。一个人可以通过列出索引列表来解决这个问题combn......

score 2 · Accepted Answer

我想我对自己的问题有一个解决方案，但也许有人可以做得更好（而且我还没有实现FLATTEN=FALSE......）

xapply <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
  L <- list(...)
  inds <- do.call(expand.grid,lapply(L,seq_along)) ## Marek's suggestion
  retlist <- list()
  for (i in 1:nrow(inds)) {
    arglist <- mapply(function(x,j) x[[j]],L,as.list(inds[i,]),SIMPLIFY=FALSE)
    if (FLATTEN) {
      retlist[[i]] <- do.call(FUN,c(arglist,MoreArgs))
    }
  }
  retlist
}

编辑：我尝试了@baptiste 的建议，但这并不容易（或者不适合我）。我得到的最接近的是

xapply2 <- function(FUN,...,FLATTEN=TRUE,MoreArgs=NULL) {
  L <- list(...)
  xx <- do.call(expand.grid,L)
  f <- function(...) {
    do.call(FUN,lapply(list(...),"[[",1))
  }
  mlply(xx,f)
}

这仍然不起作用。expand.grid确实比我想象的更灵活（尽管它创建了一个无法打印的奇怪数据框），但内部发生了足够多的魔法mlply，我无法让它发挥作用。

这是一个测试用例：

L1 <- list(data.frame(x=1:10,y=1:10),
           data.frame(x=runif(10),y=runif(10)),
           data.frame(x=rnorm(10),y=rnorm(10)))

L2 <- list(y~1,y~x,y~poly(x,2))          
z <- xapply(lm,L2,L1)
xapply(lm,L2,L1)

score 1 · Accepted Answer

@ben-bolker，我也有类似的愿望，并认为我已经制定了初步解决方案，我也已经测试过可以并行工作。该函数，我有点混淆地称为gmcmapply（g 表示网格）采用一个任意大的命名列表mvars（在函数内获取expand.grid-ed）和一个FUN使用列表名称的函数，就好像它们是函数本身的参数一样（gmcmapply将更新的形式FUN这样当FUN传递给mcmapply它的参数时，它的参数就会反映用户想要迭代的变量（这将是嵌套 for 循环中的层）。mcmapply然后动态更新这些形式的值，因为它在mvars.

我已经发布了初步代码作为一个要点（用下面的例子转载），很想得到你的反馈。我是一名研究生，自称是中级 R 爱好者，所以这肯定会推动我的 R 技能。您或社区中的其他人可能会提出可以改进我所拥有的建议。我确实认为，即使是这样，我将来也会经常使用这个功能。

gmcmapply <- function(mvars, FUN, SIMPLIFY = FALSE, mc.cores = 1, ...){
  require(parallel)

  FUN <- match.fun(FUN)
  funArgs <- formals(FUN)[which(names(formals(FUN)) != "...")] # allow for default args to carry over from FUN.

  expand.dots <- list(...) # allows for expanded dot args to be passed as formal args to the user specified function

  # Implement non-default arg substitutions passed through dots.
  if(any(names(funArgs) %in% names(expand.dots))){
    dot_overwrite <- names(funArgs[which(names(funArgs) %in% names(expand.dots))])
    funArgs[dot_overwrite] <- expand.dots[dot_overwrite]

    #for arg naming and matching below.
    expand.dots[dot_overwrite] <- NULL
  }

  ## build grid of mvars to loop over, this ensures that each combination of various inputs is evaluated (equivalent to creating a structure of nested for loops)
  grid <- expand.grid(mvars,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)

  # specify formals of the function to be evaluated  by merging the grid to mapply over with expanded dot args
  argdefs <- rep(list(bquote()), ncol(grid) + length(expand.dots) + length(funArgs) + 1)
  names(argdefs) <- c(colnames(grid), names(funArgs), names(expand.dots), "...")

  argdefs[which(names(argdefs) %in% names(funArgs))] <- funArgs # replace with proper dot arg inputs.
  argdefs[which(names(argdefs) %in% names(expand.dots))] <- expand.dots # replace with proper dot arg inputs.

  formals(FUN) <- argdefs

  if(SIMPLIFY) {
    #standard mapply
    do.call(mcmapply, c(FUN, c(unname(grid), mc.cores = mc.cores))) # mc.cores = 1 == mapply
  } else{
    #standard Map
    do.call(mcmapply, c(FUN, c(unname(grid), SIMPLIFY = FALSE, mc.cores = mc.cores)))
  }
}

下面的示例代码：

      # Example 1:
      # just make sure variables used in your function appear as the names of mvars
      myfunc <- function(...){
        return_me <- paste(l3, l1^2 + l2, sep = "_")
        return(return_me)
      }

      mvars <- list(l1 = 1:10,
                    l2 = 1:5,
                    l3 = letters[1:3])


      ### list output (mapply)
      lreturns <- gmcmapply(mvars, myfunc)

      ### concatenated output (Map)
      lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE)

      ## N.B. This is equivalent to running:
      lreturns <- c()
      for(l1 in 1:10){
        for(l2 in 1:5){
          for(l3 in letters[1:3]){
            lreturns <- c(lreturns,myfunc(l1,l2,l3))
          }
        }
      }

      ### concatenated outout run on 2 cores.
      lreturns <- gmcmapply(mvars, myfunc, SIMPLIFY = TRUE, mc.cores = 2)

     Example 2. Pass non-default args to FUN.
     ## Since the apply functions dont accept full calls as inputs (calls are internal), user can pass arguments to FUN through dots, which can overwrite a default option for FUN.
     # e.g. apply(x,1,FUN) works and apply(x,1,FUN(arg_to_change= not_default)) does not, the correct way to specify non-default/additional args to FUN is:
     # gmcmapply(mvars, FUN, arg_to_change = not_default)

     ## update myfunc to have a default argument
      myfunc <- function(rep_letters = 3, ...){
        return_me <- paste(rep(l3, rep_letters), l1^2 + l2, sep = "_")
        return(return_me)
      }

      lreturns <- gmcmapply(mvars, myfunc, rep_letters = 1)

我想添加但仍在尝试解决的一些附加功能是

将输出清理为一个带有 mvar 名称的漂亮嵌套列表（通常，我会在嵌套的 for 循环中创建多个列表，并将低级列表标记到更高级别的列表上，直到巨大嵌套的所有层循环完成）。我认为使用此处提供的解决方案的一些抽象变体会起作用，但我还没有弄清楚如何使解决方案灵活适应expand.grid-ed data.frame 中的列数。
我想要一个选项来记录在mcmapply用户指定的目录中调用的子进程的输出。因此，您可以查看生成的每个变量组合的 .txt 输出expand.grid（即，如果用户打印模型摘要或状态消息FUN作为我经常做的一部分）。我认为一个可行的解决方案是使用substitute()andbody()功能，描述here to edit FUNto open a sink()at the beginning FUNand close it at the end if user specified a directory to write to。现在，我只是将它直接编程到FUN自身中，但稍后只传递gmcmapply一个名为log_children = "path_to_log_dir. 然后将函数的主体编辑为（伪代码）sink(file = file.path(log_children, paste0(paste(names(mvars), sep = "_"), ".txt")

让我知道你的想法！

-内特

r - expand.grid 和mapply的组合？

2 回答 2

Related

Reference