python - Efficiently finding the longest matching prefix string

Question

My current implementation is this:

def find_longest_matching_option(option, options):
    options = sorted(options, key=len)
    longest_matching_option = None
    for valid_option in options:
        # Don't want to treat "oreo" as matching "o",
        # match only if it's "o reo"
        if re.match(ur"^{}\s+".format(valid_option), option.strip()):
            longest_matching_option = valid_option
    return longest_matching_option

Some examples of what I'm trying to do:

"foo bar baz something", ["foo", "foo bar", "foo bar baz"]
# -> "foo bar baz"
"foo bar bazsomething", (same as above)
# -> "foo bar"
"hello world", ["hello", "something_else"]
# -> "hello"
"a b", ["a", "a b"]
# -> "a b" # Doesn't work in current impl.

Mostly, I'm looking for efficiency here. The current implementation works, but I've been told it's O(m^2 * n), which is pretty bad.

Thanks in advance!

score 2 · Accepted Answer

让我们从foo.

def foo(x, y):
    x, y = x.strip(), y.strip()
    return x == y or x.startswith(y + " ")

foo如果两个字符串相等，或者一个（加上一个空格）是另一个的子字符串，则返回 true。

接下来，给定一个 case 字符串和一个选项列表，您可以使用它filter来查找给定 case 字符串的所有有效子字符串，然后应用max查找最长的子字符串（参见下面的测试）。

这里有几个测试用例foo。出于演示的目的，我将使用partialcurryfoo到高阶函数。

from functools import partial

cases = ["foo bar baz something", "foo bar bazsomething", "hello world", "a b", "a b"]
options = [
      ["foo", "foo bar", "foo bar baz"], 
      ["foo", "foo bar", "foo bar baz"],
      ["hello", "something_else"],
      ["a", "a b"],
      ["a", "a b\t"]
]
p_list = [partial(foo, c) for c in cases]

for p, o in zip(p_list, options):
    print(max(filter(p, o), key=len))

foo bar baz
foo bar
hello
a b
a b

score 1 · Accepted Answer

正则表达式在这里是多余的；您可以简单地在每个字符串后面附加一个空格，然后再比较它们以获得相同的结果。

您也不需要对数据进行排序。简单地遍历每个值会更有效。

def find_longest_matching_option(option, options):
    # append a space so that find_longest_matching_option("a b", ["a b"])
    # works as expected
    option += ' '
    longest = None

    for valid_option in options:
        # append a space to each option so that only complete
        # words are matched
        valid_option += ' '
        if option.startswith(valid_option):
            # remember the longest match
            if longest is None or len(longest) < len(valid_option):
                longest = valid_option

    if longest is not None:
        # remove the trailing space
        longest = longest[:-1]
    return longest

python - Efficiently finding the longest matching prefix string

2 回答 2

Related

Reference