0

我想以 genbank 文件格式保护一些 DNA 序列,以包含有关基因、域等的信息。我知道如何创建 SeqRecord 对象并将我想要的所有信息包含在文件中:

#my DNA sequence and encoded protein sequence of gene1  
genome_seq = 'ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGTTCTCCGGCAGGACACAAAGTCATGCCGTGGAGACCGCCGGTCCATAACGTGCCAGGTAGAGAGAATCAATGGTTTGCAGCGTTCTTTCACGGTCATGCTGCTTTCTGCGGGTGTGGTGACCCTGTTGGGCATCTTAACGGAAGC'  
protein_seq = 'QQRILGVKLRLLFNQVQKIQQNQDP'  
#position of gene1  
start = 12  
end = start + len(protein_seq)  
#some information  
name = 'my_contig'  
bioproject = 'BodySites'  
sample_type='blood'  
taxonomy = ['Homo Sapiens']  
reference_prot_ID = 'YP_92845z2093857'  
#dictionaries with information for SeqFeature qualifiers and SeqRecord annotations  
dict1 = {'gene':'ORF1', 'ref_ID': reference_prot_ID, 'translation':protein_seq}  
dict2 = {'SOURCE': sample_type, 'ORGANISM': 'Human', 'Taxonomy':taxonomy}  
#create SeqFeature and SeqRecord  
f1 = SeqFeature(FeatureLocation(start, end, strand=1), type='domain', qualifiers=dict1)  
my_features = [f1]  
record = SeqRecord(Seq(genome_seq, alphabet=IUPAC.unambiguous_dna), id=name, name=name\  
                   description=bioproject, annotations=dict2, features = my_features)  
print(record)  
with open('/media/sf_Desktop/test.gb', 'w') as handle:  
        SeqIO.write(record, handle, 'genbank')

我在屏幕上为 SeqRecord 对象打印的内容如下所示,其中似乎包含了所有内容:

ID: my_contig
Name: ma_contig
Description: BodySites
Number of features: 1
/SOURCE=blood
/ORGANISM=Human
/Taxonomy=['Homo Sapiens']
Seq('ATTTTGTGCAGCCGAGAGCGCGAGCGAAGCGCTTAAAAAATTCCCCCGCTCTGT...AGC', IUPACUnambiguousDNA())

但在生成的文件中,缺少有关 SOURCE、ORGANISM 和 Taxonomy 的信息:

LOCUS       my_contig                198 bp    DNA              UNK 01-JAN-1980
DEFINITION  BodySites.
ACCESSION   my_contig
VERSION     my_contig
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     domain          13..37
                     /gene="ORF1"
                     /ref_ID="YP_92845z2093857"
                     /translation="QQRILGVKLRLLFNQVQKIQQNQDP"
ORIGIN
        1 attttgtgca gccgagagcg cgagcgaagc gcttaaaaaa ttcccccgct ctgttctccg
       61 gcaggacaca aagtcatgcc gtggagaccg ccggtccata acgtgccagg tagagagaat
      121 caatggtttg cagcgttctt tcacggtcat gctgctttct gcgggtgtgg tgaccctgtt
      181 gggcatctta acggaagc
//

谁能帮助我如何在输出文件中包含注释信息?
我发现 GenBank.Record 模块可以包含所有信息,并且在屏幕上看起来非常漂亮,但是没有关于如何将 Record 对象保存到文件的信息......

4

1 回答 1

2

好的,我发现我的错误:所有注释标题都必须是小写字母。因此,将“SOURCE”更改为“source”,将“ORGANISM”更改为“organism”等等,就完成了这项工作。

干杯!

于 2020-12-15T08:49:01.967 回答