r - 在 R 中矢量化 for 循环

Question

天啊。我在从我的代码中删除 for 循环方面非常糟糕，因为我发现它们非常直观，而且我首先学习了 C++。下面，我正在为搜索（在本例中为 copd）获取 ID，并使用该 ID 检索其完整的 XML 文件，并将其位置保存到向量中。我不知道如何加快速度，在 700 个 ID 上运行大约需要 5 分钟，而大多数搜索都有 70,000 多个 ID。感谢您的任何指导。

library(rentrez)
library(XML)

# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count

# set max to count
id <- entrez_search(db = "pubmed", term = "copd", retmax = count)$ids

# empty vector that will soon contain locations
location <- character()

# get all location data 
for (i in 1:count)
{
  # get ID of each search
  test <- entrez_fetch(db = "pubmed", id = id[i], rettype = "XML")

  # convert to XML
  test_list <- XML::xmlToList(test)

  # retrieve location
  location <- c(location, test_list$PubmedArticle$MedlineCitation$Article$AuthorList$Author$AffiliationInfo$Affiliation)
}

score 2 · Accepted Answer

这可能会给你一个开始 - 似乎可以一次拉下多个。

library(rentrez)
library(xml2)

# number of articles for term copd
count <- entrez_search(db = "pubmed", term = "copd")$count

# set max to count
id_search <- entrez_search(db = "pubmed", term = "copd", retmax = count, use_history = T)

# get all
document <- entrez_fetch(db = "pubmed", rettype = "XML", web_history = id_search$web_history)

document_list <- as_list(read_xml(document))

问题是这仍然很耗时，因为有大量文档。它也很奇怪，当我尝试过它时，它恰好返回了 10,000 篇文章——你一次可以返回的内容可能是有限的。

然后你可以使用类似purrr包的东西来开始提取你想要的信息。

r - 在 R 中矢量化 for 循环

1 回答 1

Related

Reference