python - 我有一个字符串数组；有没有办法查看参数字符串最接近哪一个？

Question

我在 Python 中工作，我决定分解并制作大量短语，以便与语音识别模块的结果进行比较。到目前为止，我有：

phrases = [
    "what time is it",
    "what's the weather",
    "what's the date",
    "hello",
    "hi",
    "what's up",
    "how are you"
]

（我几分钟前才刚刚开始，所以我还没有太多......主要只是一个大纲）但无论如何，我想要一个像这样的函数......

def match(phrase):
    #match_greatest will start at zero but continuously update if the string
    #being compared has a higher percentage match
    match_greatest = 0

    #match will store the actual string that is closest
    match = ""

    for i in phrases:
        #this is the part I need help with...
        match_current = #somehow get the percentage that the argument phrase matches the phrase it's comparing to

        #if the current phrase is a closer match than before, update it
        if match_current > match_greatest:
            match_greatest = match_current
            match = i

    return match

...举个例子，如果我调用 match("what time it a") 或 match("what time sat") - 这些是语音识别可能给出的误读示例 - 并使用我当前的设置短语，它将返回“现在几点”。

score 2 · Accepted Answer

字符串之间的合理距离之一是“编辑距离”或Levenshtein距离。它计算将一个字符串转换为另一个字符串的编辑量（插入、删除和替换）。

Python实现在这里，它需要动态编程

https://pypi.python.org/pypi/python-Levenshtein/

你也可以自己实现算法，很简单。

如果您想要面向语音的距离，值得考虑soundex，它是 Levenshtein 的一个特定扩展，用于解释单词的语音属性。看

https://pypi.python.org/pypi/Fuzzy

您可以遍历字符串并找到具有最小编辑距离的字符串。

score 1 · Accepted Answer

Here's an example of how I would do it.

def match(phrase):
    phrases = [
    "what time is it",
    "what's the weather",
    "what's the date",
    "hello",
    "hi",
    "what's up",
    "how are you"
]


    match_word_dict = {}
    for element in phrases:
        sameness = 0
        for index in range(len(element)):
            if len(phrase) == index:
                break
            if phrase[index] == element[index]:
                sameness += 1



        percent = (sameness * 1.0 / len(element) * 1.0) * 100
        match_word_dict[element] = percent
    return match_word_dict

print match("hello")
print match("hel")

Where I return a dictionary that shows the phrase and percent match Also here's how I would go about only printing the phrase with the highest percent match

key, value = max(match("hello").iteritems(), key=lambda x:x[1])
print key, value

score 0 · Accepted Answer

这只是一种尝试。我们可以包含许多可能性和其他测试用例，它们会导致比我在下面所做的更复杂的逻辑。

phrases = {
    1: "what time is it",
    2: "what's the weather",
    3: "what's the date",
    4: "hello",
    5: "hi",
    6: "what's up",
    7: "how are you"
    }


def match(phrase):
    """
    """
    phr_list = phrase.split()
    max_count = 0
    key = None

    for k, v in phrases.iteritems():
        count = sum(1 for word in phr_list if word.lower() in v.split())

        if count > max_count:
            count = max_count
            key = k

    if key:
        return phrases.get(key)
    return phrase


print match("what time it a")

print match("what time sit")    

print match(" how you good")

产量：

what time is it
what time is it
how are you

python - 我有一个字符串数组；有没有办法查看参数字符串最接近哪一个？

3 回答 3

Related

Reference