I am collecting author's information and article information for a search term in PubMed. I am getting author name, publication year and other information successfully using entrez_fetch
in rentrez
package. Following is my example code:
library(rentrez)
library(XML)
pubmedSearch <- entrez_search("pubmed", term = "flexible ureteroscope", retmax = 100)
SearchResults <- entrez_fetch(db="pubmed", pubmedSearch$ids, rettype="xml", parsed=TRUE)
First_Name <- xpathSApply(SearchResults, "//Author", function(x) {xmlValue(x[["ForeName"]])})
Last_Name <- xpathSApply(SearchResults, "//Author", function(x) {xmlValue(x[["LastName"]])})
PubYear <- xpathSApply(SearchResults, "//PubDate", function(x) {xmlValue(x[["Year"]])})
PMID <- xpathSApply(SearchResults, "//ArticleIdList", function(x) {xmlValue(x[["ArticleId"]])})
Despite getting all the information I needed, I am having an issue in figuring out which authors are for which PMID. It is because length of authors are different for each PMID. For example, if I parsed author information for 100 articles as in my code, I get more than 100 authors name and I can not associate it with respective PMID. Overall, I would like to have an output data frame like this:
PMID First_Name Last_Name PubYear
28221147 Carlos Torrecilla Ortiz 2017
28221147 Sergi Colom Feixas 2017
28208536 Dean G Assimos 2017
28203551 Chad M Gridley 2017
28203551 Bodo E Knudsen 2017
So this way, I would know which are authors are associated with which PMID and it useful for further analysis.
Just for the note, this is a small example of my code. I am collecting more information using XML
parsing via entrez_fetch
in rentrez
package.
This problem is really bugging me and I would really appreciate any help or guidance. Thank you for your efforts and help in advance.