我对 python 很陌生,但我一直在使用它从 genbank 文件中提取基因序列。问题是有时我会得到我想要的输出(将序列打印到文件中),有时它会返回一个关键错误。这取决于我使用的是哪个加入。有谁知道为什么它有时会给出一个关键错误?我认为这可能与基因库记录本身有关,但它们看起来非常相似并且基因在那里(在基因特征限定符中)。EG 适用于 HG738867.1,但不适用于 AP019703.1。这是我的代码 -
from Bio import Entrez, SeqIO
gi_genome = 'accession'
name = 'acrA'
Entrez.email = 'email'
handle = Entrez.efetch(db="nucleotide", id=gi_genome, rettype="gbwithparts", retmode="text")
record = SeqIO.read(handle, "gb")
handle.close()
element = 0
for feature in record.features:
if feature.type == 'CDS' and name in feature.qualifiers["gene"]:
report = 'record.features[%s]' % str(element)
gene_sequence = feature.extract(record.seq)
with open('output.fasta', 'a') as f:
print('>' + gi_genome + ' ' + name, file=f)
print(gene_sequence, file=f)
break
else:
element = element + 1
这是追溯 -
Traceback (most recent call last):
File "/home/ubuntu/Documents/Git_Branches/Project_planning/Learning/In_progress/utils/data.py", line 11, in <module>
if feature.type == 'CDS' and name in feature.qualifiers["gene"]:
KeyError: 'gene'
Process finished with exit code 1
提前致谢!