这可以在 O(|S| + |L| + k) 时间内解决,其中 k 是 S 中来自 L 的所有字符串的匹配总数。主要有两个步骤:
运行Aho-Corasick。这将为您提供 S 中 L 中任何字符串的所有匹配项。这与上述运行时间相同。
初始化长度为 |S| 的整数数组 A + 1 到全零。遍历数组,在位置 i 将 A[i] 设置为 A[i-1],如果它更大,那么对于每个匹配项 M,从位置 i 的 S 中的 L,将 A[i+|M|] 设置为A[i+|M|] 和 A[i] + |M| 的最大值。
这是 Go 中的一些代码,正是这样做的。它使用我编写的一个包,该包有一个方便的包装器,用于调用 Aho-Corasick。
package main
import (
"fmt"
"github.com/runningwild/stringz"
)
func main() {
input := []byte("thenuke")
patterns := [][]byte{[]byte("hen"), []byte("thenu"), []byte("uke")}
// This runs Aho-Corasick on the input string and patterns, it returns a
// map, matches, such that matches[i] is a list of indices into input where
// patterns[i] matches the input string. The running time of this is
// O(|input| + |patterns| + k) and uses O(|patterns| + k) auxillary storage,
// where k is the total number of matches found.
find := stringz.FindSet(patterns)
matches := find.In(input)
// We want to invert the map so that it maps from index to all strings that
// match at that index.
at_pos := make([][]int, len(input)+1)
for index, hits := range matches {
for _, hit := range hits {
at_pos[hit] = append(at_pos[hit], index)
}
}
// Now we do a single pass through the string, at every position we will
// keep track of how many characters in the input string we can match up to
// that point.
max := make([]int, len(input)+1)
for i := range max {
// If this position isn't as good as the previous position, then we'll use
// the previous position. It just means that there is a character that we
// were unable to match anything to.
if i > 0 && max[i-1] > max[i] {
max[i] = max[i-1]
}
// Look through any string that matches at this position, if its length is
// L, then L positions later in the string we can have matched L more
// character if we use this string now. This might mean that we don't use
// another string that earlier we thought we'd be matching right now, we'll
// find out later which one was better.
for _, hit := range at_pos[i] {
L := len(patterns[hit])
if i+L < len(max) && max[i+L] < max[i]+L {
max[i+L] = max[i] + L
}
}
}
fmt.Printf("%v\n", max)
}