1

我有一个 entrez 命令,我正在通过 R 中的一个循环,它似乎在一段时间内工作得很好,但我最终得到了一个我很难弄清楚的错误。

Error in system(command = paste0(<string modified in loop>,  : 
  cannot popen '<paste'd string>', probable reason 'Too many open files'

下面的循环将在第 1020 个迭代器上开始失败:

GeneraAddresses <- vector(mode = "list",
                          length = length(PossibleGenera))

for (m1 in seq_along(GeneraAddresses)) {
  GeneraAddresses[[m1]] <- try(system(command = paste0("esearch -db assembly -query ",
                                                       "'",
                                                       PossibleGenera[m1],
                                                       "[organism] AND \"complete genome\"[filter]",
                                                       " AND \"latest genbank\"[filter]",
                                                       " AND \"genbank has annotation\"[Properties]",
                                                       "'",
                                                       " | ",
                                                       "efetch -format docsum",
                                                       " | ",
                                                       "xtract -pattern DocumentSummary -block FtpPath",
                                                       ' -match "@type:genbank"',
                                                       " -element FtpPath"),
                                      timeout = 300L,
                                      intern = TRUE))

  print(showConnections(all = TRUE))
  print(m1)
  closeAllConnections()
}

在这种情况下,您不需要我试图从中提取的属的实际载体:

rep("streptomyces", 2000) -> PossibleGenera

应该做得很好。我一直无法找到使错误更早出现的方法(以便于诊断),并且要求 R 关闭所有打开的连接似乎没有帮助(包含在上面的代码中)。我知道我可以诉诸将我的向量分解成更小的部分并以这种方式运行,但这似乎有点像放弃。

在 MacOS 上运行:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Heron_0.0.0.9023    igraph_1.2.4.2      ape_5.3             stringr_1.4.0       DECIPHER_2.14.0     RSQLite_2.2.0       Biostrings_2.54.0  
 [8] XVector_0.26.0      IRanges_2.20.2      S4Vectors_0.24.3    BiocGenerics_0.32.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3      magrittr_1.5    zlibbioc_1.32.0 bit_1.1-15.1    lattice_0.20-38 rlang_0.4.3     blob_1.2.1      tools_3.6.2     grid_3.6.2     
[10] nlme_3.1-143    DBI_1.1.0       bit64_0.9-7     digest_0.6.23   vctrs_0.2.2     memoise_1.1.0   stringi_1.4.5   compiler_3.6.2  pkgconfig_2.0.3
4

1 回答 1

1

这似乎与“超时”论点有关!

Iterations <- 2000L
ItOut <- vector(mode = "character",
                length = length(Iterations))

for (m1 in seq_len(Iterations)) {
  system(command = paste0("echo ",
                          '"',
                          "Iteration ",
                          m1,
                          '"'),
         timeout = 300L,
         intern = TRUE) -> ItOut[m1]
  print(m1)
}

print("completed successfully!")

将在 1021 上出错,给出:

...
[1] 1019
[1] 1020
Error in system(command = paste0("echo ", "\"", "Iteration ", m1, "\""),  : 
  cannot popen 'echo "Iteration 1021"', probable reason 'Too many open files'
Execution halted

而简单地注释掉有问题的行将允许循环运行完成。

Iterations <- 2000L
ItOut <- vector(mode = "character",
                length = length(Iterations))

for (m1 in seq_len(Iterations)) {
  system(command = paste0("echo ",
                          '"',
                          "Iteration ",
                          m1,
                          '"'),
         # timeout = 300L,
         intern = TRUE) -> ItOut[m1]
  print(m1)
}

print("completed successfully!")

显然我只需要在这一天睡觉。

于 2020-01-29T15:57:01.210 回答