python - 使用 BioPython 搜索 PubMed 并写入 CSV

Question

我正在使用 BioPython 从他们的 PubMed 标题中填充有关引用数据的 CSV 文件。到目前为止，我已经写了这个：

import csv
from Bio import Entrez
import bs4

Entrez.email = "my_email"
CSVfile = open('srData.csv')
fileReader = csv.reader(CSVfile)
Data = list(fileReader)

with open('blank.csv','w') as f1:
  writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
  for id in Data:
    handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
    record = Entrez.read(handle)
    title=record[0]['MedlineCitation']['Article']['ArticleTitle']
    abstract=record[0]['MedlineCitation']['Article']['Abstract']
    mesh =record[0]['MedlineCitation']['MeshHeadingList']
    descriptors = ','.join(term['DescriptorName'] for term in mesh)
    writer.writerow([title, abstract, descriptors])

然而，这会产生一个不寻常的输出，其中标题、摘要和 MeSH 术语分布在多个列中并且没有分开，我认为这是由于它们的类型。()。我希望我的 csv 表由三列组成，一列包含标题，另一列包含摘要，另一列包含网格术语。

我怎样才能做到这一点？

样本输出

为了澄清，第一列包含整个标题，摘要的开头和接下来的几列包含摘要的后续部分。我要求将它们分成不同的列。IE。第一列应该只包含标题。第二个只有摘要，第三个只有 MeSH 术语。

目前，第一列包含：

"Distinct and combined vascular effects of ACE blockade and HMG-CoA reductase inhibition in hypertensive subjects.  {u'AbstractText': ['Hypercholesterolemia and hypertension are frequently associated with elevated sympathetic activity. Both are independent cardiovascular risk factors and both affect endothelium-mediated vasodilation. To identify the effects of cholesterol-lowering and antihypertensive treatments on vascular reactivity and vasodilative capacity"

score 1 · Accepted Answer

的值record[0]['MedlineCitation']['Article']['Abstract']是包含摘要文本和较短摘要的字典。如果您想要实际的摘要，而不是：

abstract=record[0]['MedlineCitation']['Article']['Abstract']

你需要：

abstract=record[0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0]

现在abstract包含一个字符串，应该适合写入您的 CSV 文件。

更新

即使使用相同的输入数据，我也无法重现您在评论中描述的错误：

>>> from Bio import Entrez
>>> Entrez.email = '...'
>>> id=10067800
>>> handle = Entrez.efetch(db="pubmed", id=id, rettype="gb", retmode="xml")
>>> record = Entrez.read(handle)
>>> abstract=record[0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0]
>>> abstract
StringElement('To assess the antihypertensive efficacy and safety of the novel AT1 receptor antagonist, telmisartan, compared with that of enalapril in elderly patients with mild to moderate hypertension.', attributes={u'NlmCategory': u'OBJECTIVE', u'Label': u'OBJECTIVE'})
>>>

python - 使用 BioPython 搜索 PubMed 并写入 CSV

1 回答 1

Related

Reference