1

为了摆脱 RISmed 的麻烦(请参阅RISmed 和大型(ish)数据集的问题),我决定使用rentrez 和 entrez_summary 从查询中检索大量已发布的标题:

set_entrez_key("######") #I did provide my real API key here
Sys.getenv("ENTREZ_KEY")
rm(list=ls())
library(rentrez)
query="(United States[AD] AND France[AD] AND 1995:2020[PDAT])"
results<-entrez_search(db="pubmed",term=query,use_history=TRUE)
results
results$web_history
for (seq_start in seq(0, results$count, 100)) {
    if (seq_start == 0) {
        summary.append.l <- entrez_summary(
            db = "pubmed", 
            web_history = results$web_history, 
            retmax = 100, 
            retstart = seq_start
        )
    } 
    Sys.sleep(0.1) #slow things down in case THAT'S a factor here....
    summary.append.l <- append(
        summary.append.l,
        entrez_summary(
            db = "pubmed", 
            web_history = results$web_history, 
            retmax = 100, 
            retstart = seq_start
        )
    )
}

好消息...我没有像对待 RISMed 和 EUtilsGet 那样被 NCBI 彻底拒绝。坏消息......它没有完成。(我得到

Error in curl::curl_fetch_memory(url, handle = handle) : 
  transfer closed with outstanding read data remaining

或者

Error: parse error: premature EOF
                                       
                     (right here) ------^

我几乎认为在查询中使用从属关系搜索字符串是有好处的,因为如果我将查询更改为

query="monoclonal[Title] AND antibody[Title] AND 2010:2020[PDAT]"

它完成了运行,尽管要处理的记录数量大致相同。那么...任何想法为什么特定的搜索字符串会导致 NCBI 服务器出现问题?

4

0 回答 0