1

我正在编写一个 Python 脚本(从此处修改并在下面报告)以在 PubMed 上搜索某所大学的论文数量,并下载合作者的隶属关系。如果我运行代码,而不是我得到的从属关系<Element 'Affiliation' at 0x106ea7e50>。你知道如何解决这个问题吗?我应该怎么做才能为所有作者下载从属关系?谢谢!

import urllib, urllib2, sys
import xml.etree.ElementTree as ET

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

query = '(("University of Copenhagen"[Affiliation]))# AND ("1920"[Publication Date] : "1930"[Publication Date]))'

esearch = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&mindate=2001&maxdate=2010&retmode=xml&retmax=10000000&term=%s' % (query)
handle = urllib.urlopen(esearch)
data = handle.read()

root = ET.fromstring(data)
ids = [x.text for x in root.findall("IdList/Id")]
print 'Got %d articles' % (len(ids))

for group in chunker(ids, 100):
    efetch = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=pubmed&retmode=xml&id=%s" % (','.join(group))
    handle = urllib.urlopen(efetch)
    data = handle.read()

    root = ET.fromstring(data)
    for article in root.findall("PubmedArticle"):
        pmid = article.find("MedlineCitation/PMID").text
        year = article.find("MedlineCitation/Article/Journal/JournalIssue/PubDate/Year")
        if year is None: year = 'NA'
        else: year = year.text
        aulist = article.findall("MedlineCitation/Article/AuthorList/Author")
        affiliation = article.find("MedlineCitation/Article/AuthorList/Author/Affiliation")
        print pmid, year, len(aulist), affiliation
4

2 回答 2

2

发生这种情况的原因是该affiliation对象引用了一个 XML 元素,而不是一段文本。如果您想要的字符串包含在值中,如下所示:

<affiliation>
    your_affiliation_text
</affiliation>  

你想打印affiliation.text.

如果您想要的字符串包含在属性中,如下所示:

 <affiliation your_attribute_name="your_affiliation">

你想使用affiliation.attrib[name].

于 2014-11-03T21:39:02.073 回答
1

这个答案将代码更新为 Python 3,并修复了 XML 中的附属位置(我在 中看到它MedlineCitation/Article/AuthorList/Author/AffiliationInfo,不是"MedlineCitation/Article/AuthorList/Author/Affiliation,也许它随着时间的推移改变了位置?)。在此示例中,我们将根据其 PMID ( ) 仅检索 1 篇论文https://pubmed.ncbi.nlm.nih.gov/31888621/31888621的作者隶属关系:

import xml.etree.ElementTree as ET
from urllib.request import urlopen

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

efetch = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=pubmed&retmode=xml&id=%s" % ('31888621')
handle = urlopen(efetch)
data = handle.read()

root = ET.fromstring(data)
for article in root.findall("PubmedArticle"):
    pmid = article.find("MedlineCitation/PMID").text
    year = article.find("MedlineCitation/Article/Journal/JournalIssue/PubDate/Year")
    if year is None: year = 'NA'
    else: year = year.text
    aulist = article.findall("MedlineCitation/Article/AuthorList/Author")
    affiliation = article.find("MedlineCitation/Article/AuthorList/Author/AffiliationInfo")
    #print(pmid, year, len(aulist), affiliation, aulist, ET.dump(root))
    for author in aulist:    
        print(ET.dump(author))

输出:

<Author ValidYN="Y">
                    <LastName>Tang</LastName>
                    <ForeName>Lingkai</ForeName>
                    <Initials>L</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada.</Affiliation>
                    </AffiliationInfo>
                </Author>

None
<Author ValidYN="Y">
                    <LastName>Mostafa</LastName>
                    <ForeName>Sakib</ForeName>
                    <Initials>S</Initials>
                    <AffiliationInfo>
                        <Affiliation>Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada.</Affiliation>
                    </AffiliationInfo>
                </Author>

None
<Author ValidYN="Y">
                    <LastName>Liao</LastName>
                    <ForeName>Bo</ForeName>
                    <Initials>B</Initials>
                    <AffiliationInfo>
                        <Affiliation>School of Mathematics and Statistics, Hainan Normal University, Haikou, 571158, China.</Affiliation>
                    </AffiliationInfo>
                </Author>

None
<Author ValidYN="Y">
                    <LastName>Wu</LastName>
                    <ForeName>Fang-Xiang</ForeName>
                    <Initials>FX</Initials>
                    <Identifier Source="ORCID">0000-0002-4593-9332</Identifier>
                    <AffiliationInfo>
                        <Affiliation>Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada. faw341@mail.usask.ca.</Affiliation>
                    </AffiliationInfo>
                    <AffiliationInfo>
                        <Affiliation>Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada. faw341@mail.usask.ca.</Affiliation>
                    </AffiliationInfo>
                </Author>

None
于 2020-07-18T21:56:27.393 回答