为了摆脱 RISmed 的麻烦(请参阅RISmed 和大型(ish)数据集的问题),我决定使用rentrez 和 entrez_summary 从查询中检索大量已发布的标题:
set_entrez_key("######") #I did provide my real API key here
Sys.getenv("ENTREZ_KEY")
rm(list=ls())
library(rentrez)
query="(United States[AD] AND France[AD] AND 1995:2020[PDAT])"
results<-entrez_search(db="pubmed",term=query,use_history=TRUE)
results
results$web_history
for (seq_start in seq(0, results$count, 100)) {
if (seq_start == 0) {
summary.append.l <- entrez_summary(
db = "pubmed",
web_history = results$web_history,
retmax = 100,
retstart = seq_start
)
}
Sys.sleep(0.1) #slow things down in case THAT'S a factor here....
summary.append.l <- append(
summary.append.l,
entrez_summary(
db = "pubmed",
web_history = results$web_history,
retmax = 100,
retstart = seq_start
)
)
}
好消息...我没有像对待 RISMed 和 EUtilsGet 那样被 NCBI 彻底拒绝。坏消息......它没有完成。(我得到
Error in curl::curl_fetch_memory(url, handle = handle) :
transfer closed with outstanding read data remaining
或者
Error: parse error: premature EOF
(right here) ------^
我几乎认为在查询中使用从属关系搜索字符串是有好处的,因为如果我将查询更改为
query="monoclonal[Title] AND antibody[Title] AND 2010:2020[PDAT]"
它完成了运行,尽管要处理的记录数量大致相同。那么...任何想法为什么特定的搜索字符串会导致 NCBI 服务器出现问题?