我正在通过模拟 ID 拆分我的数据集,并将 runjags 函数同时应用于每个子数据集。这使我能够利用并行处理并在集群上运行我的模拟。一项需要 1 天以上的工作可以在 2 小时左右完成。
在我运行的 1000 次模拟中,大约有 25 次失败。我正在使用带有陷阱捕获错误的 for 循环从成功运行的模拟中提取 coda 文件。问题是,一旦模拟失败,该错误会影响我正在使用的所有内核,并且我也无法提取成功运行的模拟的 coda 文件。
有什么建议么?我在下面包含代码和日志文件。谢谢你。
library(parallel)
library(coda)
#1) using mclapply to apply a function to all simulations simultaneously
output_models <- parallel::mclapply(subsetdata, function(x){
library(runjags)
set.seed(1)
model_data = x
runJagsOut <- run.jags(method = "simple",
model = "tempModel.txt",
monitor = c( "mu" ),
data = model_data,
#inits = initsList, # NOTE: Let JAGS initialize.
n.chains = 1, # NOTE: Not only 1 chain.
adapt = 500,
burnin = 3000,
sample = 2500,
thin = 1,
summarise = TRUE,
plots = FALSE)
return(runJagsOut)
}, mc.cores = numcores)
#2) Build an empty list vector
mcmc <- list()
#3) Extracting coda files for each of the simulations. tryCatch function in place to 'ignore' simulations that fail
for (SimulID in 1:length(unique(df$SimulID))) {
tryCatch({
mcmc[[SimulID]] <- cbind(output_models[[SimulID]][["mcmc"]][[1]],SimulID)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
#4) Main text file with the coda for each simulation
lapply(mcmc, function(x) write.table( data.frame(x), 'output.txt' , append= T, sep=',', col.names = FALSE ))
这是我在集群上完成作业后从日志文件中看到的内容。第 1000 次模拟后跟一个错误,然后是 for 循环的下标超出范围错误。
. Initializing model
. Adapting 500
-------------------------------------------------| 500
++++++++++++++++++++++++++++++++++++++++++++++++++ 100%
Adaptation successful
. Updating 3000
-------------------------------------------------| 3000
************************************************** 100%
. . Updating 2500
-------------------------------------------------| 2500
************************************************** 100%
. . . . Updating 0
. Deleting model
.
Simulation complete. Reading coda files...
Coda files loaded successfully
Calculating summary statistics...
Finished running the simulation
Warning message:
In parallel::mclapply(subsetdata, function(x) { :
scheduled cores 39, 24, 35, 21, 22, 23, 29, 3, 8, 19, 47, 34, 6, 48, 10, 33, 38, 41, 31, 18, 5, 16, 37 encountered errors in user code, all values of the jobs will be affected
ERROR : subscript out of bounds
ERROR : subscript out of bounds
ERROR : subscript out of bounds