我想以 genbank 文件格式保护一些 DNA 序列,以包含有关基因、域等的信息。我知道如何创建 SeqRecord 对象并将我想要的所有信息包含在文件中:
#my DNA sequence and encoded protein sequence of gene1
genome_seq = 'ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGTTCTCCGGCAGGACACAAAGTCATGCCGTGGAGACCGCCGGTCCATAACGTGCCAGGTAGAGAGAATCAATGGTTTGCAGCGTTCTTTCACGGTCATGCTGCTTTCTGCGGGTGTGGTGACCCTGTTGGGCATCTTAACGGAAGC'
protein_seq = 'QQRILGVKLRLLFNQVQKIQQNQDP'
#position of gene1
start = 12
end = start + len(protein_seq)
#some information
name = 'my_contig'
bioproject = 'BodySites'
sample_type='blood'
taxonomy = ['Homo Sapiens']
reference_prot_ID = 'YP_92845z2093857'
#dictionaries with information for SeqFeature qualifiers and SeqRecord annotations
dict1 = {'gene':'ORF1', 'ref_ID': reference_prot_ID, 'translation':protein_seq}
dict2 = {'SOURCE': sample_type, 'ORGANISM': 'Human', 'Taxonomy':taxonomy}
#create SeqFeature and SeqRecord
f1 = SeqFeature(FeatureLocation(start, end, strand=1), type='domain', qualifiers=dict1)
my_features = [f1]
record = SeqRecord(Seq(genome_seq, alphabet=IUPAC.unambiguous_dna), id=name, name=name\
description=bioproject, annotations=dict2, features = my_features)
print(record)
with open('/media/sf_Desktop/test.gb', 'w') as handle:
SeqIO.write(record, handle, 'genbank')
我在屏幕上为 SeqRecord 对象打印的内容如下所示,其中似乎包含了所有内容:
ID: my_contig
Name: ma_contig
Description: BodySites
Number of features: 1
/SOURCE=blood
/ORGANISM=Human
/Taxonomy=['Homo Sapiens']
Seq('ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGT...AGC', IUPACUnambiguousDNA())
但在生成的文件中,缺少有关 SOURCE、ORGANISM 和 Taxonomy 的信息:
LOCUS my_contig 198 bp DNA UNK 01-JAN-1980
DEFINITION BodySites.
ACCESSION my_contig
VERSION my_contig
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
domain 13..37
/gene="ORF1"
/ref_ID="YP_92845z2093857"
/translation="QQRILGVKLRLLFNQVQKIQQNQDP"
ORIGIN
1 attttgtgca gccgagagcg cgagcgaagc gcttaaaaaa ttcccccgct ctgttctccg
61 gcaggacaca aagtcatgcc gtggagaccg ccggtccata acgtgccagg tagagagaat
121 caatggtttg cagcgttctt tcacggtcat gctgctttct gcgggtgtgg tgaccctgtt
181 gggcatctta acggaagc
//
谁能帮助我如何在输出文件中包含注释信息?
我发现 GenBank.Record 模块可以包含所有信息,并且在屏幕上看起来非常漂亮,但是没有关于如何将 Record 对象保存到文件的信息......