我在下面有一个代码,试图识别给定 DNA 序列的起始和结束密码子的位置。我们将起始密码子定义为ATG序列,将结束密码子定义为TGA、TAA、TAG序列。
我遇到的问题是下面的代码仅适用于前两个序列(DM208659 和 AF038953),但不适用于其余的。
我下面的方法有什么问题?
此代码可以从此处复制粘贴。
#!/usr/bin/perl -w
while (<DATA>) {
chomp;
print "$_\n";
my ($id,$rna_sq) = split(/\s+/,$_);
local $_ = $rna_sq;
while (/atg/g) {
my $start = pos() - 2;
if (/tga|taa|tag/g) {
my $stop = pos();
my $gene = substr( $_, $start - 1, $stop - $start + 1 ),$/;
my $genelen = length($gene);
my $ct = "$id $start $stop $gene $genelen";
print "\t$ct\n";
}
}
}
__DATA__
DM208659 gtgggcctcaaatgtggagcactattctgatgtccaagtggaaagtgctgcgacatttgagcgtcac
AF038953 gatcccagacctcggcttgcagtagtgttagactgaagataaagtaagtgctgtttgggctaacaggatctcctcttgcagtctgcagcccaggacgctgattccagcagcgccttaccgcgcagcccgaagattcactatggtgaaaatcgccttcaatacccctaccgccgtgcaaaaggaggaggcgcggcaagacgtggaggccctcctgagccgcacggtcagaactcagatactgaccggcaaggagctccgagttgccacccaggaaaaagagggctcctctgggagatgtatgcttactctcttaggcctttcattcatcttggcaggacttattgttggtggagcctgcatttacaagtacttcatgcccaagagcaccatttaccgtggagagatgtgcttttttgattctgaggatcctgcaaattcccttcgtggaggagagcctaacttcctgcctgtgactgaggaggctgacattcgtgaggatgacaacattgcaatcattgatgtgcctgtccccagtttctctgatagtgaccctgcagcaattattcatgactttgaaaagggaatgactgcttacctggacttgttgctggggaactgctatctgatgcccctcaatacttctattgttatgcctccaaaaaatctggtagagctctttggcaaactggcgagtggcagatatctgcctcaaacttatgtggttcgagaagacctagttgctgtggaggaaattcgtgatgttagtaaccttggcatctttatttaccaactttgcaataacagaaagtccttccgccttcgtcgcagagacctcttgctgggtttcaacaaacgtgccattgataaatgctggaagattagacacttccccaacgaatttattgttgagaccaagatctgtcaagagtaagaggcaacagatagagtgtccttggtaataagaagtcagagatttacaatatgactttaacattaaggtttatgggatactcaagatatttactcatgcatttactctattgcttatgccgtaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
BC021011 ggggagtccggggcggcgcctggaggcggagccgcccgctgggctaaatggggcagaggccgggaggggtgggggttccccgcgccgcagccatggagcagcttcgcgccgccgcccgtctgcagattgttctg
DM208660 gggatactcaaaatgggggcgctttcctttttgtctgtactgggaagtgcttcgattttggggtgtccc
AF038954 ggacccaagggggccttcgaggtgccttaggccgcttgccttgctctcagaatcgctgccgccatggctagtcagtctcaggggattcagcagctgctgcaggccgagaagcgggcagccgagaaggtgtccgaggcccgcaaaagaaagaaccggaggctgaagcaggccaaagaagaagctcaggctgaaattgaacagtaccgcctgcagagggagaaagaattcaaggccaaggaagctgcggcattgggatcccgtggcagttgcagcactgaagtggagaaggagacccaggagaagatgaccatcctccagacatacttccggcagaacagggatgaagtcttggacaacctcttggcttttgtctgtgacattcggccagaaatccatgaaaactaccgcataaatggatagaagagagaagcacctgtgctgtggagtggcattttagatgccctcacgaatatggaagcttagcacagctctagttacattcttaggagatggccattaaattatttccatatattataagagaggtccttccactttttggagagtagccaatctagctttttggtaacagacttagaaattagcaaagatgtccagctttttaccacagattcctgagggattttagatgggtaaatagagtcagactttgaccaggttttgggcaaagcacatgtatatcagtgtggacttttcctttcttagatctagtttaaaaaaaaaaaccccttaccattctttgaagaaaggaggggattaaataattttttcccctaacactttcttgaaggtcaggggctttatctatgaaaagttagtaaatagttctttgtaacctgtgtgaagcagcagccagccttaaagtagtccattcttgctaatggttagaacagtgaatactagtggaattgtttgggctgcttttagtttctcttaatcaaaattactagatgatagaattcaagaacttgttacatgtattacttggtgtatcgataatcatttaaaagtaaagactctgtcatgcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa