1

我有一个 refseq ID (keys_list) 列表,用于使用 BioPython Entrez 下拉序列记录。我想只访问返回的 fasta 记录中的序列,但我不想将记录写入文件来这样做。

我正在尝试以下代码

for key in key_list:
   Entrez.email = "myemailaddress"
   handle = Entrez.efetch(db='nuccore', id=key, rettype='fasta')
   record = SeqIO.parse(handle, "fasta")
   for seq_record in SeqIO.parse(record, "fasta"):
    print seq_record.seq

当我运行它时,我收到了错误:

File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/__init__.py", line 538, in parse
  yield r
File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
  self.gen.throw(type, value, traceback)
File "/usr/lib64/python2.6/site-packages/Bio/File.py", line 59, in as_handle
  yield handleish
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/__init__.py", line 537, in parse
  for r in i:
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/FastaIO.py", line 37, in FastaIterator
  line = handle.readline()
AttributeError: 'generator' object has no attribute 'readline'

如果我返回整个记录handle.read(),我可以获得整个 fasta 记录,但在这个阶段我只想访问核苷酸序列。

谁能帮我解决这个问题?

提前谢谢了。

4

2 回答 2

1

这就是你需要的。

代替:

handle = Entrez.efetch(db='nuccore', id=key, rettype='fasta')

试试这个:

handle = Entrez.efetch(db="nucleotide", id=key, retmode="xml") # retmode as 'xml' , db='nucleotide'
features = Entrez.read(handle)[0]
sequence = features['GBSeq_sequence'] # this is your sequence!

它返回一个字符串,即您的序列:

'ggctcgcatctctccttcacgcgcccgccgccttacctgaggccgccatccacgccggttgagtcgcgttctgccgcctcccgcctgtggtgcctcctgaactacgtccgccgtctaggtaagtttagagctcaggtcgagaccgggcctttgtccggcgctcccttggagcctacctagactcagccggctctccacgctttgcctgaccctgcttgctcaactctacgtctttgtttcgttttctgttctgcgccgttacagatcgaaagttccacccctttccctttcattcacgactgactgccggcttggcccacggccaagtaccggcaactctgctggctcggagccagcgacagcccattctatagcactctccaggagagaaatttagtacacagttgggggctcgtccgggattcgagcgcccctttattccctaggcaatgggccaaatcttttcccgtagcgctagccctattccgcggccgccccgggggctggccgctcatcactggcttaacttcctccaggcggcatatcgcctagaacccggtccctccagttacgatttccaccagttaaaaaaatttcttaaaatagctttagaaacaccggtctggatctgccccattaactactccctcctagccagcctactcccaaaaggataccccggccgggtgaatgaaattttacacatactcatccaaacccaagcccagatcccgtcccgccccgcgccgccgccgccgtcatcctccacccacgaccccccggattctgacccacaaatcccccctccctatgttgagcctacagccccccaagtccttccagtcatgcacccacatggtgcccctcccaaccaccgcccatggcaaatgaaagacctacaggccattaagcaagaagtctcccaagcggcccctggaagcccccagtttatgcagaccatccggcttgcggtgcagcagtttgaccccactgccaaagacctccaagacctcctgcagtacctttgctcctccctcgtggcttccctccatcaccagcagctagatagccttatatcagaggccgaaactcgaggtattacaggttataaccccttagccggtcccctccgtgtccaagccaacaatccacaacaacaaggattaaggcgagaataccagcaactctggctcgccgccttcgccgccctgccagggagtgccaaagacccttcctgggcctctatcctccaaggcctggaggagccttaccacgccttcgtagaacgcctcaacatagctcttgacaatgggctgccagaaggcacgcccaaagaccccattttacgttccttagcctactctaatgcaaacaaagaatgccaaaaattactacaggcccgagggcacactaatagccctctaggagatatgttgcgggcttgtcaggcctggacccccaaagacaaaaccaaagtgttagttgtccagcctaaaaaaccccccccaaatcagccgtgcttccggtgcgggaaagcaggccactggagtcgggactgcactcagcctcgtcctccccctgggccatgccccctatgtcaagatccaactcactggaagcgagactgcccccgcctaaagcccactatcccagaaccagagccagaggaggatgccctcctattagatctccccgccgacatcccacacccaaaaaactccatagggggggaggtttaacctccccccccacattacagcaagtccttcctaaccaagacccaacatctattctgccagttataccgttagatcccgcccgtcggcccgtaattaaagcccagattgacacccagaccagccacccaaagactatcgaagctctactagatacaggagcagacatgacagtccttccgatagccttgttctcaagtaatactcccctcaaaaacacatccgtgttaggggcagggggccaaacccaagatcactttaagctcacctcccttcctgtgctaatacgcctccctttccggacgacgcctattgttttaacatcttgcctagttgataccaaaaacaactgggccatcataggtcgtgatgccttacaacaatgccaaggcgtcctgtacctccctgaggcaaaaaggccgcctgtaatcttgccaatacaggcgccagctgtccttgggctagaacacctcccaaggccccccgaaatcagccagttccctttaaaccagaacgcctccaggccttgcaacacttggtccggaaggccctggaggcaggccatatcgaaccctacaccgggccaggaaataacccagtattcccagttaaaaaagccaatggaacctggcgattcatccacgacctgcgggccactaactctctaaccatagatctctcatcatcttcccccgggccccctgacttgtccagcctgccaactacactagcccacttacaaactatagaccttaaagacgcctttttccaaatccccctacctaaacagttccagccctactttgctttcactgtcccacagcagtgtaactacggccccggcactagatacgcctggagagtactaccccaagggtttaaaaatagtcccaccctgttcgaaatgcagctggcccatatcctgcagcccattcggcaagccttcccccaatgcactattcttcagtacatggatgacattctcctggcaagcccctcccatgcggacctgcaactactctcagaggccacaatggcttccctaatctcccatgggttgcctgtgtccgaaaacaaaacccagcaaacccctggaacaattaagttcctagggcaaataatttcacctaatcacctcacttatgatgcagtccccaaggtacctatacggtcccgctgggcgctacctgaacttcaagccctacttggcgagattcagtgggtctccaaaggaactcctaccttacgccagccccttcacagtctctactgtgccttacaaaggcatactgatccccgagaccaaatatatttaaatccttctcaagttcaatcattagtgcagctgcggcaggccctgtcacagaactgccgcagtagactagtccaaaccctgcccctcctaggggctattatgctgaccctcactggcaccaccactgtggtgttccagtccaagcagcagtggccacttgtctggctacatgcccccctaccccacactagccagtgcccctgggggcagctacttgcctcagctgtgttattactcgacaaatacaccttgcaatcctatggactactctgccaaaccatacatcataacatctccacccaaaccttcaaccaattcattcaaacatctgaccaccccagtgttcctatcttactccaccacagtcaccgattcaaaaatttaggtgcccagactggagaactttggaacacttttcttaaaacaactgccccattggctcctgtgaaagcccttatgccagtgtttactctttcccctgtgatcataaacaccgccccttgcctgttttcagacggatccacctcccaggcagcctatattctctgggacaagcatatattgtcacaaagatcattcccccttccgccaccgcacaagtcggcccaacgggccgaacttctcggacttttgcatggcctctccagcgcccgttcgtggcgctgtctcaacatatttctagactccaagtatctttatcattaccttcggacccttgccctaggcaccttccaaggcaggtcctctcaggccccctttcaggccctcctgccccgcttactatcgcgtaaggtcgtctatttgcaccacgttcgcagccataccaatctacctgatcccatctccaggctcaacgctctcacagatgccctactaatcacccctgtcctgcagctctctcctgcagacctacacagtttcacccattgcggacagacggccctcacactgcaaggggcaaccacaactgaggcctccaatatcctgcgctcttgccacgcctgccgcaaaaataacccacaacatcagatgcctcaaggacacatccgccgtggcctactccctaaccacatctggcaaggcgacattacccatttcaaatataaaaatacactgtatcgccttcatgtatgggtagacaccttttcaggagccatctcagctacccaaaagagaaaagaaacaagctcagaagctatttcctctttgctccaggccattgcctatctaggcaagcctagctacataaacacagacaatggccctgcctatatttcccaagacttcctcaatatgtgtacctcccttgctattcgccatactacccatgtcccctacaatccaaccagctccggacttgtagaacgctctaatggcattcttaaaaccctattatataagtactttactgacaaacccgacctacctatggataatgctctatccatagccctatggacaatcaaccacctaaatgtattaaccaactgccacaaaacccgatggcagcttcaccactccccccgactccagccgatcccagagacacattccctcagcaataaacaaacccattggtattatttcaagcttcctggtcttaatagccgccagtggaaaggaccacaggaggctcttcaagaagctgccggcgctgctctcatcccggtaagcgctagttctgcccagtggatcccgtggaggctcctcaagcgagctgcatgcccaagacccgtcggaggccccgccgatcccaaagaaaaagaccaccaacaccatgggtaagtttctcgccactttgattttattcttccagttctgccccctcatcctcggtgattacagccccagctgctgtactctcacagttggagtctcctcataccactctaaaccctgcaatcctgcccagccagtttgttcatggaccctcgacctgctggccctttcagcagatcaggccctacagccaccctgccctaatctagtaagttactccagctaccatgccacctattccctatatctattccctcattggatcaaaaagccaaaccgaaatggcggaggctattattcagcctcttattcagacccttgttccttaaaatgcccatacctagggtgccaatcatggacctgcccctatacaggagccgtctccagcccctactggaaatttcagcaagatgtcaattttactcaagaagtttcacacctcaatattaatctccatttttcaaaatgcggtttttccttctcccttctagtcgacgctccaggatatgaccccatctggttccttaataccgaacccagccaactgcctcccaccgcccctcctctactctcccactctaacctagaccatatcctcgagccctctataccatggaaatcaaaactcctgactcttgtccagttaaccctacaaagcactaattatacttgcattgtctgtatcgatcgtgccagcctatccacttggcacgtcctatactctcccaacgtctctgttccatccccttcttctacccccctcctttacccatcgttagcgcttccagccccccacctgacgttaccatttaactggacccactgctttgacccccagattcaagctatagtctcctccccctgtcataactccctcatcctgccccccttttccttgtcacctgttcccacgctaggatcccgctcccgccgagcagtaccggtggcggtctggcttgtctccgccctggccatgggagccggagtggctggcaggattaccggctccatgtccctcgcctcaggaaagagcctcctacatgaggtggacaaagatatttcccaattaactcaagcaatagtcaaaaaccacaaaaatctgctcaaaattgcacagtatgctgcccagaacagacgaggccttgatctcctgttctgggagcaaggaggattatgcaaagcattacaagaacagtgctgttttctaaatattactaattcccatgtctcaatactacaagagagacccccccttgaaaatcgagtcctgactggctggggccttaactgggaccttggcctctcacagtgggctcgagaagccttacaaactggaatcacccttgtcgcgctactccttcttgttatccttgcaggaccatgcatcctccgtcagctacgacacctcccctcgcgcgtcagatacccccattactctcttataaaccctgagtcatccctgtaaaccaagcacacaattattgcaaccacatcgcctccagcctcccctgccaataattaacctctcccatcaaatcctccttctcctgcagcaacctcctccgttcagcctccaaggactccacctcgccttccaactgtctagtatagccatcaacccccaactcctgcattttttctttcctagcactatgctgtttcgccttctcagccccttgtctccacttgcgctcacggcgctcctgctcttcctgctttctccgggcgaagtcagcggccttctcctccgcccgcttcctgcgccgtgccttctcctcttccttccttttcaaatactcagcaatctgcttttcctcctctttctcccgctctttttttcgcttcctcttctcctcagcccgtcgctgccgatcacgatgcgtttccccgcgaggtggcgctttcccccctggagggccccgtcgcagccggccgcggctttcctcttctagagatagcaaaccgtcaagcacagtttcctcctcctccttgtcctttaactcttcctccaaggataatagcccgtccaccaattcctccaccagcaggtcctccgggcatggaacaggcaaacatcgaaacagccctacggatacaaagttaaccatgcttattatcagcccacttcccagggtttggacagagtcttcttttcggatacccagtctacgtgtttggagactgtgtacaaggcgactggtgccccatctctgggggactatgttcggcccgcctacatcgtcacgccctactggccacctgtccagagcatcagatcacctgggaccccatcgatggacgcgttatcggctcagctctacagttccttatccctcgactcccctccttccccacccagagaacctctaagacccttaaggtccttaccccgccaatcactcatacaacccccaacattccaccctccttcctccaggccatgcgcaaatactcccccttccgaaatggatacatggaacccacccttgggcagcacctcccaaccctgtcttttccagaccccggactccggccccaaaacctgtacaccctctggggaggctccgttgtctgcatgtacctctaccagctttccccccccatcacctggcccctcctgccccatgtgattttttgccaccccggccagctcggggccttcctcaccaatgttccctacaaacgaatagaaaaactcctctataaaatttcccttaccacaggggccctaataattctacccgaggactgtttgcccaccacccttttccagcctgctagggcacccgtcacgctgacagcctggcaaaacggcctccttccgttccactcaaccctcaccactccaggccttatttggacatttaccgatggcacgcctatgatttccgggccctgccctaaagatggccagccatctttagtactacagtcctcctcctttatatttcacaaatttcaaaccaaggcctaccacccctcatttctactctcacacggcctcatacagtactcttcctttcataatttgcatctcctatttgaagaatacaccaacatccccatttctctactttttaacgaaaaagaggcagatgacaatgaccatgagccccaaatatcccccgggggcttagagcctctcagtgaaaaacatttccgtgaaacagaagtctgagaaggtcagggcccagaataaggctctgacgtctccccccggaggacagctcagcaccagctcaggctaggccctgacgtgtccccctaaagacaaatcataagctcagacctccgggaagccaccgggaaccacccatttcctccccatgtttgtcaagccgtcctcaggcgttgacgacaacccctcacctcaaaaaacttttcatggcacgcatacggctcaataaaataacaggagtctataaaagcgtggggacagttcaggagggggctcgcatctctccttcacgcgcccgccgccttacctgaggccgccatccacgccggttgagtcgcgttctgccgcctcccgcctgtggtgcctcctgaactacgtccgccgtctaggtaagtttagagctcaggtcgagaccgggcctttgtccggcgctcccttggagcctacctagactcagccggctctccacgctttgcctgaccctgcttgctcaactcta'
于 2013-11-23T12:29:03.477 回答
0

我相当肯定,当您使用 biopython 解析 fasta 文件时,它会将信息组织到字典中。您可以通过打印检查所有内容的组织方式

print dir(seq_record)

我知道在解析 genbank 文件时,每个 seq_record 都有一个名为 features 的字典,因此对于 FASTA 文件,假设它的组织方式相同,您可以通过以下方式访问序列

for record in SeqIO.parse(handle, "fasta"):
    for f in record.features:
        print "sequence"
        print dir(f) # Print the attributes of f to make sure that "sequence" used in the above line is in fact a key in the dictionary, if not pick the correct key to use above
于 2013-07-23T21:19:03.643 回答