1

I am trying to get my head around the Snowfall library and its usage.

Having writing a simulation that makes use of environments, I encountered the following issue. If I source a file to load functions within the parallel mode, the function seems to use a different environment than when I declare the function within parallel mode direclty.

To make things a little bit more clear, lets consider the following two scripts:

q_func.R declares the function

foo.bar <- function(x, envname) assign("val", x, envir = get(envname))
# assigns the value x to the variable "val" in the environment envname

q_snowfall.R main function that uses snowfall

library(snowfall)
SnowFunc <- function(envname) {
    # load the functions

    # Option 1 not working
    source("q_func.R")
    # Option 2 working...
    # foo.bar <- function(x, envname) assign("val", x, envir = get(envname))


    # create the new environment
    assign(envname, new.env())

    # use the function as declared in q_func.R 
    # to assign random numbers to the new env
    foo.bar(x = rnorm(1), envname = envname)

    # return the environment including the random values
    return(get("val", envir = get(envname)))
}

sfInit(parallel = TRUE, cpus = 2)
# create environment 'a' and 'b' that each will get a new variable 
# called 'val' that gets assigned a random value

envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()

If I execute the script "q_snowfall.R" I get the error

Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: object 'a' not found

However, if I use the second option (declaring the function within the SnowFunc-function the error disappears.

Do you know how Snowfall handles the different environments? Or do you even have a solution for the issue. (note that 'q_func.R' actually takes some 100 lines of code, therefore I would prefer to have it in a separate file, thus the "keep option 2" is not a solution!)

Thank you very much!

Edit If I change all get(envname) to get(envname, envir = globalenv()) it seems to work. But it seems to me that this is more or less a workaround and not a very snowfall-like solution.

4

1 回答 1

1

我认为问题不snowfall在于您通过名称(作为字符)传递环境这一事实。您不需要更改所有出现的get,并且让它查看globalEnv可能确实不安全。

get将调用更改foo.bar为查看就足够了parent.frame()(即,foo.bar调用的环境)。以下在我的机器上工作。

新的q_func.R

foo.bar <- function(x, envname) assign("val", x, envir=get(envname,
                                pos=parent.frame()))

(不是这样)新的q_snowfall.R

library(snowfall)
SnowFunc <- function(envname) {

    assign(envname, new.env())
    foo.bar(x = rnorm(1), envname = envname)

    return(get("val", envir = get(envname)))
}

source("q_func.R")
sfInit(parallel = TRUE, cpus = 2)
sfExport("foo.bar")

envs <- c("a", "b")
result <- sfClusterApplyLB(envs, SnowFunc)
sfStop()

另请注意,我source在启动集群之前sfExport会导出foo.bar到每个节点。

于 2015-07-26T11:38:18.803 回答