0

我正在尝试使用雪包进行 R 代码。我有功能

imp<-函数(x,y)

如何在 clusterApply 中使用此功能?

cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, 1:6, get("+"), 3)
stopCluster(cl)

而不是这个我想使用我的功能

cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
clusterApply(cl, imp(dataset,3), 3)
stopCluster(cl)

假设这是我的功能,我如何使用此功能在并行和分布式系统中运行..

impap<-function(x,y)
{
data<-as(x,"matrix")

t<-data+y

print(t)
}
4

1 回答 1

1

我倾向于喜欢并行和分布式计算的降雪。这是一个通用代码,在这两种情况下都可以很好地并行化,只需稍作修改,它还将为每个实例输出日志文件,以便更好地进行进度和错误跟踪。

rm(list = ls()) #remove all past worksheet variables
n_projs=5 #this is the number of iterations. Each of them gets sent to an available CPU core
proj_name_root="model_run_test"
proj_names=paste0(proj_name_root,"__",c(1:n_projs))

#FUNCTION TO RUN
project_exec=function(proj_name){
  cat('starting with', proj_name, '\n')
  ##ADD CODE HERE
  cat('done with ', proj_name, '\n')
}

require(snowfall)
# Init Snowfall with settings from sfCluster
cpucores=as.integer(Sys.getenv('NUMBER_OF_PROCESSORS'))

#TWO WAYS TO RUN (CLUSTER OR SINGLE MACHINE)
#hosts=c(commandArgs(TRUE)) #list of strings with computer names in cluster
sfInit(socketHosts=hosts, parallel=T, cpus=cpucores, type="SOCK", slaveOutfile="/home/cluster_user/output.log")

##BELOW IS THE CODE IF YOU ARE RUNNING PARALLEL IN THE SAME MACHINE (MULTI CORE)
#sfInit(parallel=T, cpus=cpucores) #This is where you would need to configure snowfall to create a cluster with the AWS instances 

#sfLibrary(sp) ##import libraries used in your function here into your snowfall instances
sfExportAll()
all_reps=sfLapply(proj_names,fun=project_exec)
sfRemoveAll()
sfStop()
于 2014-03-31T18:36:48.867 回答