我有一个在某些物种中具有同源物的序列以及这些同源物的分数。
这是来自 gff 文件的示例记录:
4592637 Beutenbergia_cavernae_DSM_12333 TILL 70731 70780 . 0 . clst_id=429;SubjectOrganism=Thermofilum_pendens_Hrk_5;SubjectScore=0.343373493975904;SubjectOrganism=Ignicoccus_hospitalis_KIN4_I;SubjectScore=0.323293172690763;SubjectOrganism=Burkholderia_pseudomallei_MSHR346;SubjectScore=0.343373493975904;SubjectOrganism=Burkholderia_mallei_SAVP1;SubjectScore=0.343373493975904;SubjectOrganism=Enterobacter_638;SubjectScore=0.343373493975904;SubjectOrganism=Rickettsia_felis_URRWXCal2;SubjectScore=0.343373493975904;SubjectOrganism=Gemmatimonas_aurantiaca_T_27;SubjectScore=0.343373493975904;SubjectOrganism=Streptomyces_coelicolor;SubjectScore=0.363453815261044;SubjectOrganism=Beutenbergia_cavernae_DSM_12333;SubjectScore=1;SubjectOrganism=Kocuria_rhizophila_DC2201;SubjectScore=0.343373493975904;SubjectOrganism=Rhodococcus_jostii_RHA1;SubjectScore=0.383534136546185;SubjectOrganism=Symbiobacterium_thermophilum_IAM14863;SubjectScore=0.363453815261044;
==>4592637 => NAPP(Nucleic Acid Phylogenetic Profiling database) 序列 ID(不是 genbank id)
==>Beutenbergia_cavernae_DSM_12333 => 序列的物种名称
==>TILL => 序列类型
==>70731 .. 70780 => 序列的开始和结束
==>clst_id=429 => 是这个序列的簇id
==>SubjectOrganism => 序列与其有同源物的物种名称
==>SubjectScore => 该物种序列的同源物分数(Blastn 分数)
我想从SubjectOrganism
序列(4592637)有相似之处的地方提取序列。
如何使用 Python 从序列具有同源物的基因组中提取序列?