8

Recently, I have created an object factor=1 in my workspace, not knowing that there is a function factor in the base package.

What I intended to do was to use the variable factor within a parallel loop, e.g.,

library(plyr)
library(foreach)
library(doParallel)

workers <- makeCluster(2)
registerDoParallel(workers,cores=2)

factor=1

llply(
  as.list(1:2),
  function(x) factor*x,
  .parallel = TRUE,
  .paropts=list(.export=c("factor"))
     )

This, however, results in an error that took me so time to understand. As it seems, plyr creates the object factor in its environemt exportEnv, but uses base::factor instead of the user provided object. See the following example

llply(
  as.list(1:2),
  function(x) {
    function_env=environment();
    global_env=parent.env(function_env);
    export_env=parent.env(global_env);
    list(
      function_env=function_env,
      global_env=global_env,
      export_env=export_env,
      objects_in_exportenv=unlist(ls(envir=export_env)),
      factor_found_in_envs=find("factor"),
      factor_in_exportenv=get("factor",envir=export_env)
      )
    },
  .parallel = TRUE,
  .paropts=list(.export=c("factor"))
  )

stopCluster(workers)

If we inspects the output of llply, we see that the line factor_in_exportenv=get("factor",envir=export_env) does not return 1 (corresponding to the user-provided object) but the function definition of base::factor.

Question 1) How can I understand this behavior? I would have expected the output to be 1.

Question 2) Is there a way to get a warning from R if I assign a new value to an object that was already defined in another package (such in my case factor)?

4

2 回答 2

1

llply 函数在后台调用“foreach”。Foreach 使用“parant.frame()”来确定要评估的环境。llply 案例中的 parant.frame 是什么?它是 llply 的函数环境,没有定义因子。

不使用llply,为什么不直接使用foreach呢?

library(plyr)
library(foreach)
library(doParallel)

workers <- makeCluster(2)
registerDoParallel(workers,cores=2)

factor=1
foreach(x=1:2) %dopar% {factor*x}

请注意,您甚至不需要 .export 参数,因为在这种情况下它会自动执行此操作。

于 2016-12-09T03:35:32.040 回答
0

首先,我应该注意,如果使用另一个未在其中使用的变量名,错误就会消失base——例如,如果我们使用a而不是factor. 这清楚地表明沿其搜索路径在(值为 1 的变量)之前llply找到base::factor(函数) 。factor我试图用简化版本来复制这个问题llply,即

library(plyr)
library(foreach)
library(doParallel)

workers <- makeCluster(2)
registerDoParallel(workers,cores=2)

factor=1

llply_simple=function(.x,.fun,.paropts) {
  #give current environment a name
  tmpEnv=environment()
  attr(tmpEnv,"name")="llply_simple_body"
  #print all enclosing envirs of llply_simple_body (see def of allEnv below)
  print(allEnv(tmpEnv))
  cat("------\nResults:\n")
  do.ply=function(i) {
    .fun(i)
  }
  fe_call <- as.call(c(list(quote(foreach::foreach), i = .x), .paropts))
  fe <- eval(fe_call)
  foreach::`%dopar%`(fe, do.ply(i))
}

llply_simple使用递归辅助函数 ( allEnv) 循环遍历所有封闭环境。它返回一个包含所有环境名称的向量

allEnv=function(x) {
  if (environmentName(x)=="R_EmptyEnv") {
    return(environmentName(x))
  } else {
    c(environmentName(x),allEnv(parent.env(x)))
  }
}

有趣的是,简化函数实际上按预期工作(即,给出12作为结果)

llply_simple(1:2,function(x) x*factor,list(.export="factor"))
#[1] "llply_simple_body"  "R_GlobalEnv"        "package:doParallel" "package:parallel"  
#[5] "package:iterators"  "package:foreach"    "package:plyr"       "tools:rstudio"     
#[9] "package:stats"      "package:graphics"   "package:grDevices"  "package:utils"     
#[13] "package:datasets"   "package:methods"    "Autoloads"          "base"              
#[17] "R_EmptyEnv"
#--------
#Results:        
#[[1]]
#[1] 1
#
#[[2]]
#[1] 2

llply_simple所以关于完整功能的唯一显着区别plyr::llply是后者属于一个包。让我们尝试移动llply_simple到一个包中。

package.skeleton(list=c("llply_simple","allEnv"),name="llplyTest")
unlink("./llplyTest/DESCRIPTION")
devtools::create_description("./llplyTest",
                             extra=list("devtools.desc.author"='"T <t@t.com>"'))
tmp=readLines("./llplyTest/man/llply_simple.Rd")
tmp[which(grepl("\\\\title",tmp))+1]="Test1"
writeLines(tmp,"./llplyTest/man/llply_simple.Rd")
tmp=readLines("./llplyTest/man/allEnv.Rd")
tmp[which(grepl("\\\\title",tmp))+1]="Test2"
writeLines(tmp,"./llplyTest/man/allEnv.Rd")
devtools::install("./llplyTest")

现在尝试llplyTest::llply_simple从我们的新包中执行llplyTest

library(llplyTest)
llplyTest::llply_simple(1:2,function(x) x*factor,list(.export="factor"))
#[1] "llply_simple_body"  "llplyTest"          "imports:llplyTest"  "base"              
#[5] "R_GlobalEnv"        "package:doParallel" "package:parallel"   "package:iterators" 
#[9] "package:foreach"    "package:plyr"       "tools:rstudio"      "package:stats"     
#[13] "package:graphics"   "package:grDevices"  "package:utils"      "package:datasets"  
#[17] "package:methods"    "Autoloads"          "base"               "R_EmptyEnv"
#------
#Results:
#Error in do.ply(i) : 
#  task 1 failed - "non-numeric argument to binary operator"

突然之间,我们遇到了与 2013 年我原来的问题相同的错误。所以这个问题显然与从包中调用函数有关。让我们看一下 的输出allEnv:它基本上为我们提供了环境序列,llpy_simplellplyTest::llpy_simple用于查找应该导出的变量。实际上就是foreach导出,如果有人有兴趣了解为什么foreach真正从我们命名的环境开始llply_simple_body,请查看 的源代码foreach::%dopar%foreach:::getDoParforeach:::.foreachGlobals$fun遵循envir参数的路径。

我们现在可以清楚地看到非包版本的搜索顺序与包版本的搜索llplyTest::llpy_simple顺序factor不同base

于 2016-12-10T17:02:13.117 回答