我有一个 gff 文件,如下所示:
contig1 loci gene 452050 453069 15 - . ID=dd_g4_1G94;
contig1 loci mRNA 452050 453069 14 - . ID=dd_g4_1G94.1;Parent=dd_g4_1G94
contig1 loci exon 452050 452543 . - . ID=dd_g4_1G94.1.exon1;Parent=dd_g4_1G94.1
contig1 loci exon 452592 453069 . - . ID=dd_g4_1G94.1.exon2;Parent=dd_g4_1G94.1
contig1 loci mRNA 452153 453069 15 - . ID=dd_g4_1G94.2;Parent=dd_g4_1G94
contig1 loci exon 452153 452543 . - . ID=dd_g4_1G94.2.exon1;Parent=dd_g4_1G94.2
contig1 loci exon 452592 452691 . - . ID=dd_g4_1G94.2.exon2;Parent=dd_g4_1G94.2
contig1 loci exon 452729 453069 . - . ID=dd_g4_1G94.2.exon3;Parent=dd_g4_1G94.2
###
我希望重命名 ID 名称,从 0001 开始,这样对于上述基因,条目是:
contig1 loci gene 452050 453069 15 - . ID=dd_0001;
contig1 loci mRNA 452050 453069 14 - . ID=dd_0001.1;Parent=dd_0001
contig1 loci exon 452050 452543 . - . ID=dd_0001.1.exon1;Parent=dd_0001.1
contig1 loci exon 452592 453069 . - . ID=dd_0001.1.exon2;Parent=dd_0001.1
contig1 loci mRNA 452153 453069 15 - . ID=dd_0001.2;Parent=dd_g4_1G94
contig1 loci exon 452153 452543 . - . ID=dd_0001.2.exon1;Parent=dd_0001.2
contig1 loci exon 452592 452691 . - . ID=dd_0001.2.exon2;Parent=dd_0001.2
contig1 loci exon 452729 453069 . - . ID=dd_0001.2.exon3;Parent=dd_0001.2
上面的例子只是一个基因条目,但我希望重命名所有基因,以及它们对应的 mRNA/外显子,从 ID = dd_0001 开始连续。任何有关如何执行此操作的提示将不胜感激。