0

我有一个 30 行的小文本文件,每行有两个相似的单词。我需要计算每行两个单词之间的levenshtein 距离。在计算距离时,我还需要使用memoize函数。一般来说,我对 Python 和算法都很陌生,所以这对我来说非常困难。我打开并读取了文件,但我无法弄清楚如何将这两个单词中的每一个分配给变量“a”和“b”来计算距离。

这是我目前仅打印文档的当前脚本:

txt_file = open('wordfile.txt', 'r')

def memoize(f):
    cache = {}
    def wrapper(*args, **kwargs):
        try:
            return cache[args]
        except KeyError:
            result = f(*args, **kwargs)
            cache[args] = result
            return result
    return wrapper

@memoize
def lev(a,b):
    if len(a) > len(b):
        a,b = b,a
        b,a = a,b

current = range(a+1)
for i in range(1,b+1):
    previous, current = current, [i]+[0]*n
    for j in range(1,a+1):
        add, delete = previous[j]+1, current[j-1]+1
        change = previous[j-1]
        if a[j-1] != b[i-1]:
            change = change + 1
        current[j] = min(add, delete, change)

return current[b]

if __name__=="__main__":
    with txt_file as f:
        for line in f:
            print line

以下是文本文件中的几句话,以便大家了解:

原型,原型

专有的,专有的

认出,认出

排除,排除

龙卷风,龙卷风

发生了,发生了

虚空,附近

这是脚本的更新版本,仍然没有功能但更好

class memoize:
    def __init__(self, function):
    self.function = function
    self.memoized = {}

def __call__(self, *args):
    try:
      return self.memoized[args]
    except KeyError:
      self.memoized[args] = self.function(*args)
      return self.memoized[args]

@memoize
def lev(a,b):
    n, m = len(a), len(b)
    if n > m:
        a, b = b, a
        n, m = m, n
    current = range(n + 1)
    for i in range(1, m + 1):
        previous, current = current, [i] + [0] * n
        for j in range(1, n + 1):
            add, delete = previous[j] + 1, current[j - 1] + 1
            change = previous[j - 1]
            if a[j - 1] != b[i - 1]:
                change = change + 1
            current[j] = min(add, delete, change)
    return current[n]

if __name__=="__main__":
    for pair in open("wordfile.txt", "r"):
        a,b = pair.split()
        lev(a, b)
4

2 回答 2

2

假设问题在于将单词传递给lev. 并假设您的 wordfile 是这样的 -

bat, man
cat, goat
foo, bar

你可以做这样的事情然后 -

if __name__ == '__main__':

    for pair in open("wordfile", "r"):

        # first, remove all spaces, then break around the comma
        a,b = pair.replace(' ', '').split(',')

        # pass these words to lev
        lev(a, b)
于 2012-10-09T16:11:45.787 回答
0

在 Abhishek 的回答和评论的帮助下,我找到了这个问题的答案。这是最终运行的脚本,以防其他人需要它:

def memoize(f):
    cache = {}
    def wrapper(*args, **kwargs):
        try:
            return cache[args]
        except KeyError:
            result = f(*args, **kwargs)
            cache[args] = result
            return result
    return wrapper

@memoize
def lev(a,b):
    n, m = len(a), len(b)
    if n > m:
        a, b = b, a
        n, m = m, n
    current = range(n + 1)
    for i in range(1, m + 1):
        previous, current = current, [i] + [0] * n
        for j in range(1, n + 1):
            add, delete = previous[j] + 1, current[j - 1] + 1
            change = previous[j - 1]
            if a[j - 1] != b[i - 1]:
                change = change + 1
            current[j] = min(add, delete, change)
    return current[n]

if __name__=="__main__":
    lev = Counter(lev)
    word_file = open('wordfile.txt', 'r')
    for line in word_file:
            a,b = line.split()
            print a,b, lev(a, b)
于 2012-10-09T18:57:44.350 回答