0

我在 Python 中工作,我决定分解并制作大量短语,以便与语音识别模块的结果进行比较。到目前为止,我有:

phrases = [
    "what time is it",
    "what's the weather",
    "what's the date",
    "hello",
    "hi",
    "what's up",
    "how are you"
]

(我几分钟前才刚刚开始,所以我还没有太多......主要只是一个大纲)但无论如何,我想要一个像这样的函数......

def match(phrase):
    #match_greatest will start at zero but continuously update if the string
    #being compared has a higher percentage match
    match_greatest = 0

    #match will store the actual string that is closest
    match = ""

    for i in phrases:
        #this is the part I need help with...
        match_current = #somehow get the percentage that the argument phrase matches the phrase it's comparing to

        #if the current phrase is a closer match than before, update it
        if match_current > match_greatest:
            match_greatest = match_current
            match = i

    return match

...举个例子,如果我调用 match("what time it a") 或 match("what time sat") - 这些是语音识别可能给出的误读示例 - 并使用我当前的设置短语,它将返回“现在几点”。

4

3 回答 3

2

字符串之间的合理距离之一是“编辑距离”或Levenshtein距离。它计算将一个字符串转换为另一个字符串的编辑量(插入、删除和替换)。

Python实现在这里,它需要动态编程

https://pypi.python.org/pypi/python-Levenshtein/

你也可以自己实现算法,很简单。

如果您想要面向语音的距离,值得考虑soundex,它是 Levenshtein 的一个特定扩展,用于解释单词的语音属性。看

https://pypi.python.org/pypi/Fuzzy

您可以遍历字符串并找到具有最小编辑距离的字符串。

于 2015-03-28T18:28:39.673 回答
1

Here's an example of how I would do it.

def match(phrase):
    phrases = [
    "what time is it",
    "what's the weather",
    "what's the date",
    "hello",
    "hi",
    "what's up",
    "how are you"
]


    match_word_dict = {}
    for element in phrases:
        sameness = 0
        for index in range(len(element)):
            if len(phrase) == index:
                break
            if phrase[index] == element[index]:
                sameness += 1



        percent = (sameness * 1.0 / len(element) * 1.0) * 100
        match_word_dict[element] = percent
    return match_word_dict

print match("hello")
print match("hel")

Where I return a dictionary that shows the phrase and percent match Also here's how I would go about only printing the phrase with the highest percent match

key, value = max(match("hello").iteritems(), key=lambda x:x[1])
print key, value 
于 2015-03-28T15:19:10.387 回答
0

这只是一种尝试。我们可以包含许多可能性和其他测试用例,它们会导致比我在下面所做的更复杂的逻辑。

phrases = {
    1: "what time is it",
    2: "what's the weather",
    3: "what's the date",
    4: "hello",
    5: "hi",
    6: "what's up",
    7: "how are you"
    }


def match(phrase):
    """
    """
    phr_list = phrase.split()
    max_count = 0
    key = None

    for k, v in phrases.iteritems():
        count = sum(1 for word in phr_list if word.lower() in v.split())

        if count > max_count:
            count = max_count
            key = k

    if key:
        return phrases.get(key)
    return phrase


print match("what time it a")

print match("what time sit")    

print match(" how you good")

产量:

what time is it
what time is it
how are you
于 2015-03-28T15:11:03.173 回答