我正在编写一个 Python 脚本(从此处修改并在下面报告)以在 PubMed 上搜索某所大学的论文数量,并下载合作者的隶属关系。如果我运行代码,而不是我得到的从属关系<Element 'Affiliation' at 0x106ea7e50>
。你知道如何解决这个问题吗?我应该怎么做才能为所有作者下载从属关系?谢谢!
import urllib, urllib2, sys
import xml.etree.ElementTree as ET
def chunker(seq, size):
return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))
query = '(("University of Copenhagen"[Affiliation]))# AND ("1920"[Publication Date] : "1930"[Publication Date]))'
esearch = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&mindate=2001&maxdate=2010&retmode=xml&retmax=10000000&term=%s' % (query)
handle = urllib.urlopen(esearch)
data = handle.read()
root = ET.fromstring(data)
ids = [x.text for x in root.findall("IdList/Id")]
print 'Got %d articles' % (len(ids))
for group in chunker(ids, 100):
efetch = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=pubmed&retmode=xml&id=%s" % (','.join(group))
handle = urllib.urlopen(efetch)
data = handle.read()
root = ET.fromstring(data)
for article in root.findall("PubmedArticle"):
pmid = article.find("MedlineCitation/PMID").text
year = article.find("MedlineCitation/Article/Journal/JournalIssue/PubDate/Year")
if year is None: year = 'NA'
else: year = year.text
aulist = article.findall("MedlineCitation/Article/AuthorList/Author")
affiliation = article.find("MedlineCitation/Article/AuthorList/Author/Affiliation")
print pmid, year, len(aulist), affiliation