python - 我的算法的运行时间复杂度 - 我如何计算并进一步优化算法？

Question

我设计了一个递归算法并用 Python 写下来。当我用不同的参数测量运行时间时，似乎需要指数级的时间。此外; 以 50 之类的小数字结束需要半个多小时。（我没有等到它完成，但它似乎没有在合理的时间内完成，猜测它是指数的）。

所以，我很好奇这个算法的运行时间复杂度。有人可以帮我推导出方程 T(n,m) 吗？或者计算大哦？

算法如下：

# parameters:
# search string, the index where we left on the search string, source string, index where we left on the source string,
# and the indexes array, which keeps track of the indexes found for the characters
def find(search, searchIndex, source, sourceIndex, indexes):
    found = None
    if searchIndex < len(search): # if we haven't reached the end of the search string yet
        found = False
        while sourceIndex < len(source): # loop thru the source, from where we left off
            if search[searchIndex] == source[sourceIndex]: # if there is a character match
                # recursively look for the next character of search string 
                # to see if it can be found in the remaining part of the source string
                if find(search, searchIndex + 1, source, sourceIndex + 1, indexes):
                    # we have found it
                    found = True # set found = true
                    # if an index for the character in search string has never been found before.
                    # i.e if this is the first time we are finding a place for that current character
                    if indexes[searchIndex] is None:
                        indexes[searchIndex] = sourceIndex # set the index where a match is found
                    # otherwise, if an index has been set before but it's different from what
                    # we are trying to set right now. so that character can be at multiple places.
                    elif indexes[searchIndex] != sourceIndex: 
                        indexes[searchIndex] = -1 # then set it to -1.
            # increment sourceIndex at each iteration so as to look for the remaining part of the source string. 
            sourceIndex = sourceIndex + 1
    return found if found is not None else True

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    # so in this example where N=7, allcards=['B','R','R','B','R','B','R']
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]
    # indexes is initially None.
    indexes = [None] * len(colors)

    find(colors, 0, allcards, 0, indexes)
    return indexes    

if __name__ == "__main__":
    print theCards(7, list("BBB"))

我不知道是否必须了解问题和算法才能得出最坏情况的运行时间，但这是我试图解决的问题：

问题：

给定一个源字符串 SRC 和一个搜索字符串 SEA，在 SRC 中找到子序列 SEA，并返回在 SRC 中找到 SEA 的每个字符的索引。如果 SEA 中的一个字符可以在 SRC 中的多个位置，则为该字符位置返回 -1。

例如; 如果源字符串是 BRRBRBR (N=7) 并且搜索字符串是 BBB：那么 'BBB' 中的第一个 'B' 可以出现在搜索字符串的索引 0 处。第二个“B”可以位于搜索字符串的索引 3，最后一个“B”可以位于第 5 个位置。此外; 字符“BBB”的位置没有其他替代方案，因此算法返回 [0,3,5]。

在另一种情况下，源字符串是 BRRBRB (N=6) 并且搜索字符串是 RBR：“RBR”的第一个“R”可以位于位置 1 或 2。这仅留下位置 3 用于“B”和位置最后一个“R”为 4。然后，第一个“R”可以在多个位置，它的位置是模棱两可的。其他两个字符 B 和 R 只有一个位置。所以算法返回 [-1,4,5]。

] (N=58)，搜索字符串为 RBRRBRBBRBRRBBRRBBBRRBBBRR。它应该返回 [-1, -1, -1, -1, -1, -1, -1, -1, 17, 18, 19, 23, -1, -1, -1, -1, -1 , -1, -1, -1, -1, -1, -1, -1, 47, 53 ]，但不幸的是它没有 =(

优化：

当“索引”列表完全充满 -1 时，我想停止搜索。但这只会影响最好的情况（或者可能是平均情况），而不是最坏的情况。如何进一步优化该算法。我知道这个问题存在多项式解决方案。

比优化更重要的是，我真的很好奇运行时间的 T(n,m) 方程，其中 n 和 m 是源字符串和搜索字符串的长度。

如果你能读到这里，非常感谢你！=)

编辑 - IVIad 的解决方案已实施：

def find2(search, source):
    indexes = list()
    last = 0
    for ch in search:
        if last >= len(source):
            break
        while last < len(source) and source[last] != ch:
            last = last + 1
        indexes.append(last)
        last = last + 1
    return indexes

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]

    indexes = find2(colors, allcards) # find the indexes of the first occurrences of the characters
    colors.reverse() # now reverse both strings
    allcards.reverse()
    # and find the indexes of the first occurrences of the characters, again, but in reversed order
    indexesreversed = find2(colors, allcards)
    indexesreversed.reverse() # reverse back the resulting list of indexes 
    indexesreversed = [len(allcards) - i - 1 for i in indexesreversed] # fix the indices

    # return -1 if the indices are different when strings are reversed
    return [indexes[i] + 1 if indexes[i] == indexesreversed[i] else - 1 for i in range(0, len(indexes))]

if __name__ == "__main__":
    print theCards(495, list("RBRRBRBBRBRRBBRRBBBRRBBBRR"))

score 4 · Accepted Answer

您可以在中执行此操作O(n + m)，其中m的字符数SEA和n中的字符数在哪里SRC：

last = 1
for i = 1 to m do
    while SRC[last] != SEA[i]
        ++last

    print last
    ++last (skip this match)

基本上，对于中的每个字符SEA，找到它在中的位置SRC，但只在找到前一个字符的位置之后扫描。

例如; 如果源字符串是 BRRBRBR (N=7) 并且搜索字符串是 BBB

然后： find Bin SRC: 在last = 1 print 1, set找到last = 2。

查找B于SRC：查找于last = 4、打印4、设置last = 5。

查找B于SRC：查找于last = 6、打印6、设置last = 7。完毕。

至于您的算法的复杂性，我无法提供非常正式的分析，但我会尝试解释我将如何去做。

假设所有字符在两者中和它们之间都是相等SRC的SEA。因此，我们可以消除您的 while 循环中的条件。另请注意，您的 while 循环执行n时间。

请注意，对于第一个字符，您将调用find(1, 1), ... find(m, n). 但是这些也将启动它们的 while 循环并进行它们自己的递归调用。每个都find(i, j)将O(m)在其 while 循环中进行递归调用 for i = 1 to n。但是这些反过来会自己进行更多的递归调用，从而导致一种导致指数复杂性的“雪崩效应”。

所以你有了：

i = 1: calls find(2, 2), find(3, 3), ..., find(m, n)
       find(2, 2) calls find(3, 3), ..., find(m, n)
       find(3, 3) calls find(4, 4), ..., find(m, n)
       find(4, 4) calls find(5, 5), ..., find(m, n)
       ...
       total calls: O(m^m)
i = 2: same, but start from find(2, 3).
...
i = n: same

因此，总复杂度看起来像O(n*m^m)。我希望这是有道理的，我没有犯任何错误。

score 3 · Accepted Answer

这只是最长的公共子序列问题。这可以通过动态编程来实现，以获得比指数时间少得多的时间。在您的情况下，当 LCS 返回 SEA 的长度时，您就知道序列 SEA 存在于 SRC 中，在修改算法时保存它们的索引是一件小事。这是一个很好的解释的链接。http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

score 0 · Accepted Answer

快速浏览一下，您是在搜索、递归和回溯吗？

我相信为您的源字符串创建一个后缀数组将是一个好主意。
构造后缀数组的复杂度为 O(nlogn)。定位子字符串的时间复杂度为 O(logn)。

python - 我的算法的运行时间复杂度 - 我如何计算并进一步优化算法？

3 回答 3

Related

Reference