1

我是一名业余程序员(我的实际专业是生物学),所以如果代码很糟糕,我深表歉意。

无论如何,我正在做一个 rosalind.info 练习(http://rosalind.info/problems/subs/),希望我找到一个特定 DNA 基序包含在更大 DNA 序列中的每个索引。基本上,我需要在字符串中找到子字符串的索引。应该很容易吧?好吧,也许你可以帮助我。

所以这是我的代码:

with open('rosalind_subs.txt') as f:
    seq = f.readline()
    seq.strip()
    subs = f.readline()
    subs.strip()
    break

def finder(x, y):
    index = x.find(y)
    return index

print("sequence is: " + seq)
print("subs is: " + subs)

print(finder(seq, subs))

这是我的输出:

sequence is:  ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT

motif is: CTCTTTTCT

-1

***Repl Closed***

我把它***Repl Closed***留在那里,努力不遗余力。也许它与 Sublime REPL 有关?

无论如何,您可能无法仅通过查看来判断,但实际上在 DNA 序列中多次发现该基序,只是查找功能没有发现它。是什么赋予了?

4

2 回答 2

1

break 不适用于 with 范围。请删除并尝试。我已经测试了下面的代码。

with open('rosalind_subs.txt') as f:
    seq = f.readline()
    seq.strip()
    subs = f.readline()
    subs.strip()

def finder(x, y):
    index = x.find(y)
    return index

print("sequence is: " + seq)
print("subs is: " + subs)

print(finder(seq, subs))

输出是

>>> 
sequence is: ACCAGTCTCTTTTTTCTCTTTTCTCTTTTCTCTTTTGACCCTCTTTTCGTCACTCTTTTACCTCTTTTTCTCTTTTACTCTTTTCTCTTTTACTCTTTTACTCTTTTAGCGCAGATCTCTTTTCTCTTTTGGCTCTTTTGTCATCCTCTTTTAGACTCTTTTGGGAAGCGACGCCTCTTTTCTCTTTTCTCTTTTGCCTCTTTTTATAACCTAAAAGACTCTTTTCCCTCTTTTCCGATTTGCCAAGGGCTCTCTTTTCTCTTTTGCTCTTTTCTCTTTTCTCTTTTTACTCTTTTCTCTTTTCGCCCCAAGATTAACTCTTTTTCTCTTTTCTCTCTTTTTTCCTCTTTTCTCTTTTGAATTGACCTCTTTTTCTCTTTTTTTGGGCCGCTCTTTTCTCTTTTACTCTTTTCTCTCTTTTAACAGCTCTTTTCCTTCTCTTTTGTCTCTTTTAGTATACTCTTTTACTCTTTTCTCTTTTCTCTCTTTTACTCTTTTGCTCTTTTCTCTTTTTGTCTCTTTTGCCCTGTCTCTTTTCACGCTTCTCTTTTAGTGTACTCTTTTACTCTTTTTGGCTCTTTTCGAATTTGTTAGCTCTTTTGCTCTTTTCTCTTTTGCTCTTTTGTCTCTTTTGATCAGATTCTCTTTTTCTCTTTTCTCTTTTCCTTAAGCAGATTTCTCTTTTCTCTTTTTCTCTCTTTTGCTCTTTTACTCTTTTACTGCTTTCTCTTTTACAACCTCTTTTACTCTTTTAAGCTCTTTTCTCTTTTGCGCCTCTTTTCCTCCCCTCTTTTTAGCTCTTTTCTCTTTTTCGCTCTTTTCAGCTCTTTTCACTCTTTTGTTTTGAGCTCTTTTCAGACTCTTTTATCCTCTTTTTTCCTCTTTTAGCGCTCTTTTGTAGCCTCTTTT

subs is: CTCTTTTCT
15
于 2015-11-26T14:21:57.680 回答
0

也是一位在这里做过几次 rosalind.info 练习的生物学家。

首先,可以通过使用来改进要读取序列和主题的代码splitlines(),它负责删除换行符。还要注意我如何使用元组解包来同时分配seqmotif变量。

with open('rosalind_subs.txt') as f:
    seq, motif = f.read().splitlines()

接下来,您正确地注意到find它只返回您的主题第一次出现的索引。要查找所有出现的情况,知道 find 需要另一个可选参数会有所帮助start。如果您提供它,它将从该索引位置开始查找。在循环中使用它,您将获得所有索引。

另一种方法是使用正则表达式。请注意,主题可能会相互重叠,因此您需要使用前瞻断言

于 2015-11-26T15:00:32.423 回答