我正在玩在 Haskell 中计算Levenshtein 距离,并且对以下性能问题感到有点沮丧。如果您为 Haskell 实现最“正常”的方式,如下所示(dist),一切正常:
dist :: (Ord a) => [a] -> [a] -> Int
dist s1 s2 = ldist s1 s2 (L.length s1, L.length s2)
ldist :: (Ord a) => [a] -> [a] -> (Int, Int) -> Int
ldist _ _ (0, 0) = 0
ldist _ _ (i, 0) = i
ldist _ _ (0, j) = j
ldist s1 s2 (i+1, j+1) = output
where output | (s1!!(i)) == (s2!!(j)) = ldist s1 s2 (i, j)
| otherwise = 1 + L.minimum [ldist s1 s2 (i, j)
, ldist s1 s2 (i+1, j)
, ldist s1 s2 (i, j+1)]
但是,如果你稍微弯曲你的大脑并将其实现为 dist',它的执行速度会快得多(大约 10 倍)。
dist' :: (Ord a) => [a] -> [a] -> Int
dist' o1 o2 = (levenDist o1 o2 [[]])!!0!!0
levenDist :: (Ord a) => [a] -> [a] -> [[Int]] -> [[Int]]
levenDist s1 s2 arr@([[]]) = levenDist s1 s2 [[0]]
levenDist s1 s2 arr@([]:xs) = levenDist s1 s2 ([(L.length arr) -1]:xs)
levenDist s1 s2 arr@(x:xs) = let
n1 = L.length s1
n2 = L.length s2
n_i = L.length arr
n_j = L.length x
match | (s2!!(n_j-1) == s1!!(n_i-2)) = True | otherwise = False
minCost = if match then (xs!!0)!!(n2 - n_j + 1)
else L.minimum [(1 + (xs!!0)!!(n2 - n_j + 1))
, (1 + (xs!!0)!!(n2 - n_j + 0))
, (1 + (x!!0))
]
dist | (n_i > n1) && (n_j > n2) = arr
| n_j > n2 = []:arr `seq` levenDist s1 s2 $ []:arr
| n_i == 1 = (n_j:x):xs `seq` levenDist s1 s2 $ (n_j:x):xs
| otherwise = (minCost:x):xs `seq` levenDist s1 s2 $ (minCost:x):xs
in dist
我已经尝试seq
了第一个版本中的所有常用技巧,但似乎没有什么可以加快速度。这对我来说有点不满意,因为我希望第一个版本更快,因为它不需要评估整个矩阵,只需要评估它需要的部分。
有谁知道是否有可能让这两个实现类似地执行,或者我只是在后者中获得尾递归优化的好处,因此如果我想要性能,就需要忍受它的不可读性?
谢谢,猎户座