r - 在 doMC 的 foreach 和 dopar 中调用其他包有什么注意事项？

Question

此代码按预期工作：

library(dplyr)
data <- list(t1 = "hello world.", t2 = "bye world")

library(doMC)
registerDoMC(3)

res <- foreach(t = data) %dopar% {

    print(sprintf("processing %s", t))

    data.frame(text = t) %>%
    dplyr::count(text)

}

print(res)

但是，此代码仅打印“处理 hello world”。和“处理再见世界”，然后挂起（不抛出异常）。

library(dplyr)
coreNLP::initCoreNLP()

data <- list(t1 = "hello world.", t2 = "bye world")

library(doMC)
registerDoMC(3)

res <- foreach(t = data) %dopar% {

    print(sprintf("processing %s", t))

    coreNLP::annotateString(t)$token

}

print(res)

%dopar%如果我更改为，上面的代码将按预期工作%do%。

我不明白是什么导致了这种行为。为什么在内部调用 coreNLP 函数%dopar%会导致 R 挂起但与其他包一起工作正常？这是否与 coreNLP 对 Java 的依赖有关？

这是输出sessionInfo()：

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.0

score 1 · Accepted Answer

您的第一个示例在看起来类似的设置上对我来说效果很好。运行示例后我的会话信息如下；确保使用新的 R 会话重试 ( R --vanilla)。我有四个核心（来自parallel::detectCores()）。

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] doMC_1.3.4      iterators_1.0.8 foreach_1.4.3   dplyr_0.5.0    

loaded via a namespace (and not attached):
[1] compiler_3.4.0   magrittr_1.5     R6_2.2.0         assertthat_0.2.0
[5] DBI_0.6-1        tibble_1.3.0     Rcpp_0.12.10     codetools_0.2-15

你的第二个例子对我也不起作用。输出如下。我的猜测是，分叉的进程不能共享 coreNLP 所依赖的相同的底层 Java 进程/服务；不太了解coreNLP。

> res <- foreach(t = data) %dopar% {
+ 
+     print(sprintf("processing %s", t))
+ 
+     coreNLP::annotateString(t)$token
+ 
+ }
[1] "processing hello world."
[1] "processing bye world"


^CError in selectChildren(ac, 1) : 
  Java called System.exit(130) requesting R to quit - trying to recover
Error during wrapup: C stack usage  591577121812 is too close to the limit

 *** caught segfault ***
address 0x2, cause 'memory not mapped'

r - 在 doMC 的 foreach 和 dopar 中调用其他包有什么注意事项？

1 回答 1

Related

Reference