0

我有一个奇怪的情况。我正在使用rentrez. 当我运行entrez_search()然后entrez_summary()然后entrez_fetch()我收到此错误消息(帖子底部的完整代码):

Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>

在四处搜索之后,我想我已经在这个关于查询大小的讨论中找到了解决方案。当我retmax_set从 500 减少到 10 时,代码起作用了。然后我反复确定retmax_set不会引发错误的最大值,并发现在我看来是非常奇怪的行为。

搜索term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"产生 552 条记录。使用不同的值运行我的代码时retmax

  • 设置retmax_set<= 183 作品
  • 设置retmax_set>= 184 给出上述错误

修改后的搜索term_set = "transcription AND enhancer AND promoter AND 2018[PDAT]"产生 186 条记录。使用不同的值运行此搜索时retmax

  • 设置retmax_set<= 61 作品
  • 设置retmax_set>= 62 给出上述错误

搜索term_set = "transcription AND enhancer AND promoter AND 2017[PDAT]"产生 395 条记录(出于某种原因,PubMed 将 29 条记录标记为 2017 年和 2018 年发布)。当在这个搜索词上运行我的代码时,具有不同的值retmax

  • 设置retmax_set<= 131 作品
  • 设置retmax_set>= 132 给出上述错误

retmax有趣的是,当值大于记录总数的三分之一(552 / 3 = 184、186 / 3 = 62、395 / 3 = 131.67)时,所有三个搜索都开始失败。我将修改我的代码以retmax_set根据返回的结果数进行计算entrez_search,但我不知道为什么rentrez或 NCBI 会这样做。有任何想法吗?

> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 182
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 183
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 184
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>
> 
> 
> ##  set search term 
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ##  load package
> library(rentrez)
> ##  set maximum records batch
> retmax_set = 185
> ##  search pubmed using web history
> search <- entrez_search(
+   db = "pubmed", 
+   term = term_set, 
+   use_history = T
+ )
> ##  get summaries of search hits 
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+     summary1 <- entrez_summary(
+         db = "pubmed", 
+         web_history = search$web_history, 
+         retmax = retmax_set, 
+         retstart = seq_start
+     )
+     summary <- c(summary, summary1)
+ }
> ##  download full XML refs for hits
> XML_refs <- entrez_fetch(
+     db = "pubmed", 
+     web_history = search$web_history, 
+     rettype = "xml", 
+     parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
    <ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_52654089_130.14.22.215_9001_1531773493_484860305_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
    <ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
    Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC --- 
</ERROR>
</eFetchResult>
4

1 回答 1

1

事实证明,rentrez 使用 0-base 计数。所以 552 条记录对应于retstart0 到 551 的值。由于我的代码正在寻找 1 到 552 的值,因此它错过了第一条记录 (#0),然后在查找不存在的记录 #552 时抛出了错误。

于 2019-02-08T17:10:49.337 回答