0

我需要弄清楚 A 是否是 B 的子字符串,同时允许一定数量的错误 (n)。A 和 B 都是字符串,n 是整数。最大的问题是我的内部 while 循环。我不确定如何对其进行编码,以便输出我想要的内容。它应该通过 B 循环寻找与 A 匹配的 n 个错误。如果 A 是“abcd”而 B 是“bdefabddghl”,那么输出应该是在位置 4 找到了 A,但有 1 个错误。这是我现在拥有的代码。我只需要知道如何编写内部 while 循环。任何帮助是极大的赞赏。

def subsequence(A,B,n):
answer = -1             # assume worst case for there-exists type of loop
j = 0                   # index b
while j<=len(B)-len(A) and answer==-1:  # there exists j in b loop
    bx = j              # assume a is found in b starting at b[j]
    i = 0               # assume best case for a for-all type of loop
    axy = n             # accumulator for n
        while i<len(A) and bx==j and axy > 0:   # for all i in a
        if A[i] == B[j-i] and axy > 0:
            bx = j
            axy = n         # accumulator for n
        if A[i] != B[j-i] and axy > 0:
            axy -= 1
        i+=1
        # while i
    j+=1
# while j
end = "A best match with " + str(n-axy) + " errors was found starting at position " + str(bx)."
return end
print subsequence("abcd","bcjabddec",3)

谢谢

4

2 回答 2

0

我假设你正在做这个练习,或者类似的东西:http ://www.cs.hofstra.edu/~cscccl/csc15p/dnalab.txt

我不确定起始示例是否特别有用。这很难理解,因此很难以指定的方式适应。我会就如何解决这些问题提出一些建议:

  1. 把问题分解成更容易解决的小问题。
  2. 编写测试每个函数的代码会给出您期望的结果。
  3. 使用变量和函数名称来明确每个含义,以便您自己和向其展示代码的人更容易理解代码。

你要做的是沿着字符串 B 滑动字符串 A 并在每个位置测试它的匹配程度,记住你找到的最佳匹配。分离问题的一个简单部分是衡量在两个字符串的某些特定对齐方式下获得的匹配程度。

这是我对解决方案的一般形式的建议:

def string_distance(A, B, index_b):
    '''Count how many substitutions are required to
    change string A into the string beginning at
    index_b in string B.'''
    return 999 # FIXME

def subsequence(A, B, n):
    '''Search for the substring of B that is equal to
    A except for the minimal number of substitutions,
    and return a tuple of the index of that substring
    in B and the number of substitutions required,
    unless there is no such substring with n or fewer
    substitutions, in which case return the tuple of
    (-1, -1).'''
    answer_distance = -1
    answer_index = -1
    index = 0
    while index <= len(B) - len(A):
        substitutions_required = string_distance(A, B, index)
        # Does a match at this location need n or fewer substitutions?
        is_close_enough = False # FIXME
        # Is a match at this location better than the best match so far?
        is_better = False # FIXME
        if is_close_enough and is_better:
            answer_distance = substitutions_required
            answer_index = index
        index += 1
    return answer_index, answer_distance

以下是一些基本测试的样子:

def test_string_distance():
    test_data = [
        # a         b          index_b   expected
        ("ABC",    "ABCDEF",   0,        0),
        ("ABX",    "ABCDEF",   0,        1),
        ("XYZ",    "ABCDEF",   0,        3),
        ("XBC",    "ABCDEF",   0,        1),
        ("CEE",    "ABCDEF",   2,        1),
        ("DEF",    "ABCDEF",   3,        0),
        ("AAAAA",  "BBBBBBBB", 3,        5),
        ("BAAAA",  "BBBBBBBB", 3,        4),
        ("ABAAB",  "BBBBBBBB", 3,        3),
    ]
    for a, b, index_b, expected in test_data:
        result = string_distance(a, b, index_b)
        if result != expected:
            print "string_distance({}, {}, {}) was {} but should be {}".format(
                    a, b, index_b, result, expected)

def test_subsequence():
    test_data = [
        # A         B               n   expected
        ("AXY",    "AYAXXXAAYYAX",  3,  (2,1)),
        ("AXY",    "AYAXXXAAYYAX",  2,  (2,1)),
        ("AXY",    "AYAXXXAAYYAX",  1,  (2,1)),
        ("AXY",    "AYAXXXAAYYAX",  0,  (-1,-1)),
        ("XAAY",   "AYAXXXAAYYAX",  2,  (5,0)),
        ("XAAY",   "XXAXAAXAAY",    2,  (6,0)),
        ("ABCDEF", "ZZAABAXCDEEEF", 3,  (5,2)),
        ("ABCDEF", "ZZAABAXCDEEEF", 2,  (5,2)),
        ("ABCDEF", "ZZAABAXCDEEEF", 1,  (-1,-1)),
        ("ABCDEF", "ZZAABXBCDEEEF", 3,  (5,2)),
    ]
    for a, b, n, expected in test_data:
        result = subsequence(a, b, n)
        if result != expected:
            print "test_subsequence({}, {}, {}) was {} but should be {}".format(
                    a, b, n, result, expected)

if __name__ == '__main__':
    test_string_distance()
    test_subsequence()

首先找出将通过这些测试的 string_distance 的实现。然后让子序列工作。

于 2013-11-08T23:04:32.487 回答
0

您可能应该看一下difflib.get_close_matches功能。它几乎做同样的事情。如果你真的想实现它,通过简单地构建一个相同长度的所有子序列的列表并按匹配数对它们进行排序来实现它要容易得多。

def subsequence(a, b, n):

    sb = [(i, b[i:i+len(a)]) for i in xrange(len(b) - len(a) + 1)]

    def count_matches(s):
        return sum(int(x == y) for (x, y) in zip(a, s[1]))

    sb.sort(key=count_matches, reverse=True)

    best = sb[0]

    e = len(a) - count_matches(best)
    i = best[0]
    w = best[1]

    print 'A best match with %d errors was found starting at position %d' % (e, i)
于 2013-11-08T23:06:11.693 回答