0

我试图找到所有可能的最小核苷酸长度的阅读框。

"A[TU]G(?:(...){3}){%d,}?(?:[TU]AG|[TU]AA|[TU]GA)" % (minimal_aa) 

这几乎可以满足我的要求,但是由于某种原因,某些阅读框不承认某些终止密码子。

我确定这与该(...)部分有关。我如何告诉它总是停在[TU]AG|[TU]AA|[TU]GA,尽管通过多个起始密码子很好。

我在 Eclipse 上使用 Python。

我正在使用 Pythex.org 来检查我的字符串,但这里有一个我正在谈论的示例:

AUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAACACACGUCCAACUCAGUUUGCCUGUCCUUCAGGUUAGAGACGUGCUAGUGCGUGGCUUCGGGGACUCUGUGGAAGAGGCCCUAUCGGAGGCACGUGAACACCUCAAAAAUGGCACUUGUGGUCUAGUAGAGCUGGAAAAAGGCGUACUGCCCCAGCUUGAACAGCCCUAUGUGUUCAUUAAACGUUCUGAUGCCUUAAGCACCAAUCACGGCCACAAGGUCGUUGAGCUGGUUGCAGAAAUGGACGGCAUUCAGUACGGUCGUAGCGGUAUAACACUGGGAGUACUCGUGCCACAUGUGGGCGAAACCCCAAUUGCAUACCGCAAUGUUCUUCUUCGUAAGAACGGUAAUAAGGGAGCCGGUGGUCAUAGCUAUGGCAUCGAUCUAAAGUCUUAUGACUUAGGUGACGAGCUUGGCACUGAUCCCAUUGAAGAUUAUGAACAAAACUGGAACACUAAGCAUGGCAGUGGUGCACUCCGUGAACUCACUCGUGAGCUCAAUGGAGGUGCAGUCACUCGCUAUGUCGACAACAAUUUCUGUGGCCCAGAUGGGUACCCUCUUGAUUGCAUCAAAGAUUUUCUCGCACGCGCGGGCAAGUCAAUGUGCACUCUUUCCGAACAACUUGAUUACAUCGAGUCGAAGAGAGGUGUCUACUGCUGCCGUGACCAUGAGCAUGAAAUUGCCUGGUUCACUGAGCGCUCUGAUAAGAGCUACGAGCACCAGACACCCUUCGAAAUUAAGAGUGCCAAGAAAUUUGACACUUUCAAAGGGGAAUGCCCAAAGUUUGUGUUUCCUCUUAACUCAAAAGUCAAAGUCAUUCAACCACGUGUUGAAAAGAAAAAGACUGAGGGUUUCAUGGGGCGUAUACGCUCUGUGUACCCUGUUGCAUCUCCACAGGAGUGUAACAAUAUGCACUUGUCUACCUUGAUGAAAUGUAAUCAUUGCGAUGAAGUUUCAUGGCAGACGUGCGACUUUCUGAAAGCCACUUGUGAACAUUGUGGCACUGAAAAUUUAGUUAUUGAAGGACCUACUACAUGUGGGUACCUACCUACUAAUGCUGUAGUGAAAAUGCCAUGUCCUGCCUGUCAAGACCCAGAGAUUGGACCUGAGCAUAGUGUUGCAGAUUAUCACAACCACUCAAACAUUGAAACUCGACUCCGCAAGGGAGGUAGGACUAGAUGUUUUGGAGGCUGUGUGUUUGCCUAUGUUGGCUGCUAUAAUAAGCGUGCCUACUGGGUUCCUCGUGCUAGUGCUGAUAUUGGCUCAGGCCAUACUGGCAUUAAACUGGGUUCCUCGUGCUAGUGCUGAUAUUGGCUCAGGCCAUACUGGCAUUAAACUGGGUUCCUCGUGCUAGUGCUGAUAUUGGCUCAGGCCAUACUGGCAUUAA

等待。这是一个不好的例子。因为它现在实际上检查了我的输出。我不得不缩短它,但是有一个代码有几千个核苷酸,其中充满了终止密码子,并且没有任何工作正常。我希望你明白我的意思,如果没有,请不要担心。

提前感谢朋友们!

4

1 回答 1

1

尝试使用这种模式找到所有小的并最终重叠的序列:

(?=A[TU]G((?:.{3})+?)[TU](?:AG|AA|GA))

您可以在没有起始和终止密码子的情况下在捕获组 1 中找到每个序列。

于 2013-09-12T19:00:12.240 回答