我查看了有关此主题的其他问题,但找不到真正解决我想要弄清楚的问题的东西。
问题是这样的:我正在尝试创建一个程序,在两条互补的 DNA 链中寻找回文,返回每个已识别回文的位置和长度。
例如,如果给定序列 TTGATATCTT,程序应该找到补码 (AACTATAGAA),然后将第二个索引标识为 6 字符回文的开头。
我是编程新手,所以它可能看起来很傻,但我想出的代码看起来像这样:
'''This first part imports the sequence (usually consisting of multiple lines of text)
from a file. I have a feeling there's an easier way of doing this, but I just don't
know what that would be.'''
length = 4
li = []
for line in open("C:/Python33/Stuff/Rosalind/rosalind_revp.txt"):
if line[0] != ">":
li.append(line)
seq = (''.join(li))
'''The complement() function takes the starting sequence and creates its complement'''
def complement(seq):
li = []
t = int(len(seq))
for i in range(0, t):
n = (seq[i])
if n == "A":
li.append(n.replace("A", "T"))
if n == "T":
li.append(n.replace("T", "A"))
if n == "C":
li.append(n.replace("C", "G"))
if n == "G":
li.append(n.replace("G", "C"))
answer = (''.join(li))
return(answer)
'''the ip() function goes letter by letter, testing to see if it matches with the letter
x spaces in front of it on the complementary strand(x being specified by length). If the
letter doesn't match, it continues to the next one. After checking all possibilities for
one length, the function runs again with the argument length+1.'''
def ip(length, seq):
n = 0
comp = complement(seq)
while length + n <= (len(seq)):
for i in range(0, length-1):
if seq[n + i] != comp[n + length - 1 - i]:
n += 1
break
if (n + i) > (n + length - 1 - i):
print(n + 1, length)
n += 1
if length <= 12:
ip(length + 1, seq)
ip(length, seq)
从短序列(例如,TCAATGCATGCGGGTCTATATGCAT)开始,事情运行得非常完美,但是对于更长的序列,我总是会收到以下错误消息:
Traceback (most recent call last):
File "C:/Python33/Stuff/Ongoing/palindrome.py", line 48, in <module>
ip(length, seq)
File "C:/Python33/Stuff/Ongoing/palindrome.py", line 39, in ip
if seq[n + i] != comp[n + length - 1 - i]:
IndexError: string index out of range
在程序完成检查可能的 4 字符回文之后,在启动长度 + 1 的函数之前给出该消息。
我明白消息在说什么,但我不明白为什么我会收到它。为什么这适用于某些字符串而不适用于其他字符串?在过去的一个小时里,我一直在检查序列是否有奇数个字符或偶数个字符,是 4 的倍数,是否是 4 的倍数,等等。我难住了。我错过了什么?
任何帮助,将不胜感激。
PS 问题来自 Rosalind 网站(Rosalind.info),它使用从 1 开始的编号。因此最后的 print(n+1, length) 。