So this works fine, but record.get("SO"), "?") returns only the abbreviation of the journal
No it doesn't. It won't even run due to this line:
records = list(records)
as records
isn't defined. And even if you fix that, all you get back from:
idlist = record["IdList"]
is a list of numbers like: ['17510654', '2246389']
that are intended to be passed back via an Entrez.efetch()
call to get the actual data. So when you do record.get("SO", "?")
on one of these number strings, your code blows up (again).
First, the "SO"
field abbreviation is defined to return Journal Title Abbreviation (TA) as part of what it returns. You likely want "JT"
Journal Title instead as defined in MEDLINE/PubMed Data Element (Field) Descriptions. But neither of these has anything to do with this lookup.
Here's a rework of your code to get the article title and the title of the journal that it's in:
from Bio import Entrez
Entrez.email = "my_email@gmail.com" # change this to be your email address
handle = Entrez.esearch(db="pubmed", term="cancer AND wombats", retmax=20)
record = Entrez.read(handle)
handle.close()
for identifier in record['IdList']:
pubmed_entry = Entrez.efetch(db="pubmed", id=identifier, retmode="xml")
result = Entrez.read(pubmed_entry)
article = result['PubmedArticle'][0]['MedlineCitation']['Article']
print('"{}" in "{}"'.format(article['ArticleTitle'], article['Journal']['Title']))
OUTPUT
> python3 test.py
"Of wombats and whales: telomere tales in Madrid. Conference on telomeres and telomerase." in "EMBO reports"
"Spontaneous proliferations in Australian marsupials--a survey and review. 1. Macropods, koalas, wombats, possums and gliders." in "Journal of comparative pathology"
>
Details can be found in the document: MEDLINE PubMed XML Element Descriptions