python - AlignIO 在 FASTA 文件中找不到记录

Question

我想开始使用 Biopython 来对齐序列文件，但库一直给我错误。我的代码如下：

from Bio import AlignIO
import Bio

alignment = AlignIO.read("A_prot.fasta","fasta")
print alignment

我确保放在A_prot.fasta与我的程序相同的目录中，但我收到一条错误消息：

Traceback (most recent call last):
  File "bio_seq_align.py", line 5, in <module>
   alignment = AlignIO.read("A_prot.fasta","fasta")
  File "/usr/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-i686.egg/Bio/AlignIO/__init__.py", line 427, in read
raise ValueError("No records found in handle")
ValueError: No records found in handle

score 0 · Accepted Answer

您可能会收到“ValueError：在句柄中找不到记录”的一个原因是您机器上的文件是否实际上是空的。

这就是我使用您在上面的评论中链接到的ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/A_prot.fasta发生的事情，

>>> from Bio import AlignIO
>>> align = AlignIO.read("A_prot.fasta", "fasta")
Traceback (most recent call last):
...
ValueError: Sequences must all be the same length

这是预期的结果 - FASTA 文件不是一组对齐的序列。如果您想将其作为对齐方式加载，请首先运行 MUSCLE、Clustal Omega 等对齐工具。但是，查看文件和长度范围后，我怀疑这对于此示例是否合理：

>>> from Bio import SeqIO
>>> lengths = set(len(record) for record in SeqIO.parse("A_prot.fasta", "fasta"))
>>> lengths
set([17, 19, 26, 50, 51, 53, 59, 65, 66, 71, 72, 73, 74, ..., 364, 365])

score 0 · Accepted Answer

peterjc 很好地指出了AlignIO必须具有相同长度的对齐序列。如果你想读入一个包含未对齐序列的fata，你可以使用SeqIO如下：

>>> from Bio import SeqIO
>>> handle = open("A_prot.fasta", "rU")
>>> print handle
<open file 'A_prot.fasta', mode 'rU' at 0x13fc1d8>
>>>

要将序列读入字典，您可以使用以下内容：

>>> record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))
>>> print len(record_dict)
2186 # Fasta file contains 2186 entries
>>>

在这种情况下，记录 ID 成为关键。要访问与特定密钥相关的信息，请使用：

>>> record_dict['HLA:HLA00001']
SeqRecord(seq=Seq('MAVMAPRTLLLLLSGALALTQTWAGSHSMRYFFTSVSRPGRGEPRFIAVGYVDD...CKV', SingleLetterAlphabet()), id='HLA:HLA00001', name='HLA:HLA00001', description='HLA:HLA00001 A*01:01:01:01 365 bp', dbxrefs=[])
>>>

有关更多信息，请参阅AlignIO和 SeqIO文档

python - AlignIO 在 FASTA 文件中找不到记录

2 回答 2

Related

Reference