我假设你正在做这个练习,或者类似的东西:http ://www.cs.hofstra.edu/~cscccl/csc15p/dnalab.txt
我不确定起始示例是否特别有用。这很难理解,因此很难以指定的方式适应。我会就如何解决这些问题提出一些建议:
- 把问题分解成更容易解决的小问题。
- 编写测试每个函数的代码会给出您期望的结果。
- 使用变量和函数名称来明确每个含义,以便您自己和向其展示代码的人更容易理解代码。
你要做的是沿着字符串 B 滑动字符串 A 并在每个位置测试它的匹配程度,记住你找到的最佳匹配。分离问题的一个简单部分是衡量在两个字符串的某些特定对齐方式下获得的匹配程度。
这是我对解决方案的一般形式的建议:
def string_distance(A, B, index_b):
'''Count how many substitutions are required to
change string A into the string beginning at
index_b in string B.'''
return 999 # FIXME
def subsequence(A, B, n):
'''Search for the substring of B that is equal to
A except for the minimal number of substitutions,
and return a tuple of the index of that substring
in B and the number of substitutions required,
unless there is no such substring with n or fewer
substitutions, in which case return the tuple of
(-1, -1).'''
answer_distance = -1
answer_index = -1
index = 0
while index <= len(B) - len(A):
substitutions_required = string_distance(A, B, index)
# Does a match at this location need n or fewer substitutions?
is_close_enough = False # FIXME
# Is a match at this location better than the best match so far?
is_better = False # FIXME
if is_close_enough and is_better:
answer_distance = substitutions_required
answer_index = index
index += 1
return answer_index, answer_distance
以下是一些基本测试的样子:
def test_string_distance():
test_data = [
# a b index_b expected
("ABC", "ABCDEF", 0, 0),
("ABX", "ABCDEF", 0, 1),
("XYZ", "ABCDEF", 0, 3),
("XBC", "ABCDEF", 0, 1),
("CEE", "ABCDEF", 2, 1),
("DEF", "ABCDEF", 3, 0),
("AAAAA", "BBBBBBBB", 3, 5),
("BAAAA", "BBBBBBBB", 3, 4),
("ABAAB", "BBBBBBBB", 3, 3),
]
for a, b, index_b, expected in test_data:
result = string_distance(a, b, index_b)
if result != expected:
print "string_distance({}, {}, {}) was {} but should be {}".format(
a, b, index_b, result, expected)
def test_subsequence():
test_data = [
# A B n expected
("AXY", "AYAXXXAAYYAX", 3, (2,1)),
("AXY", "AYAXXXAAYYAX", 2, (2,1)),
("AXY", "AYAXXXAAYYAX", 1, (2,1)),
("AXY", "AYAXXXAAYYAX", 0, (-1,-1)),
("XAAY", "AYAXXXAAYYAX", 2, (5,0)),
("XAAY", "XXAXAAXAAY", 2, (6,0)),
("ABCDEF", "ZZAABAXCDEEEF", 3, (5,2)),
("ABCDEF", "ZZAABAXCDEEEF", 2, (5,2)),
("ABCDEF", "ZZAABAXCDEEEF", 1, (-1,-1)),
("ABCDEF", "ZZAABXBCDEEEF", 3, (5,2)),
]
for a, b, n, expected in test_data:
result = subsequence(a, b, n)
if result != expected:
print "test_subsequence({}, {}, {}) was {} but should be {}".format(
a, b, n, result, expected)
if __name__ == '__main__':
test_string_distance()
test_subsequence()
首先找出将通过这些测试的 string_distance 的实现。然后让子序列工作。