1

[Reposted from https://cs.stackexchange.com/questions/12986/sliding-window-edit-distance ]

If you have a long string of length n and a shorter string of length m, what is a suitable recurrence to let you compute all n-m+1 Levenshtein distances between the shorter string and all substrings of the longer string of length m?

Can it in fact be done in O(nm) time?

4

2 回答 2

4

Computing the Levenshtein distances for a sliding window boils down to computing the distances between several pairs of vertices in an acyclic directed planar graph that looks like this one (capital letters denote the pairs).

   h a y s t a c k

n  A-B-C-D-E-F-*-*
   |\|\|\|\|\|\|\|
e  *-*-*-*-*-*-*-*
   |\|\|\|\|\|\|\|
e  *-*-A-B-C-D-E-F

The horizontal and vertical arcs have cost 1; the diagonal arcs have cost 0 if the corresponding letters match or 1 otherwise.

Since all of the paired vertices lie on the infinite face, Klein's or Cabello-Chambers's multiple-source shortest paths algorithm can be used to compute the needed distances in time O(m n log (m n)).

To shave the final log (and practically speaking, it's much worse than for, e.g., Dijkstra's algorithm), you might look in Alexander Tiskin's manuscript Semi-local string comparison: Algorithmic techniques and applications, which treats problems similar to this one if not this one itself. (Probably that should be my primary answer, but I haven't read it and know the multiple-source shortest path techniques a lot better.)

It's also possible that, with some additional logic to handle the unidirectional edges, my multiple-source shortest path algorithm with Klein could be made to achieve O(m n).

于 2013-07-01T22:12:07.053 回答
0

It's not exactly what you were asking, but it might help you.

If you want to find the minimal distance from the short word to any of substrings of the long word, there's a simple variation on the Levenshtein Distance Fuzzy substring matching with Levenshtein distance in Python Generally, you set the cost of adding characters at the end or beginning of the string to 0

于 2013-07-01T20:17:58.793 回答