algorithm - 原始 Boyer-Moore 算法和 Boyer-Moore-Horspool 算法之间的区别

Question

我无法理解 Horspool 在他的算法中所做的更改。如果您有任何 Boyer-Moore-Horspool 算法的链接，请告诉我。

score 6 · Accepted Answer

以下是我的一些观察：

BM：

Preprocessing complexity: Θ(m + σ)

Worst Case : Θ(nm) If pattern exists Θ(n+m) If pattern doesn't exist" 
Best Case : Θ(n/m) 
Space: Θ(σ) 
Comparisions: Θ(3n)

Preprocessing: Uses Good Suffix and Bad Character Shift.
At every step, it slides the pattern by the max of the slides suggested by 
the two heuristics. So it uses best of the two heuristics at every step.

Boyer Moore algorithm uses the "bad" text character itself to determine the 
shift distance.

Probably the fastest non-indexed text-search algorithm known.  
Works better in natural text and larger text searches.

A little difficult to implement.

Mostly used.

BMH：

Preprocessing complexity : Θ(m + σ)

Worst Case : Θ(nm)
Best Case : Θ(n)
Space : Θ(σ)
Comparisons : Θ(1/σ) to Θ(2/σ+1)

Preprocessing:

Recommends to Use only Bad Character Shift.
uses the rightmost character of the current text window to determine the 
shift distance.

Efficient for small alphabets

Easy to implement.

score 2 · Accepted Answer

我想维基百科条目说明了一切。

编辑：
根据条目中的外部链接之一：

Boyer-Moore 算法中使用的坏字符移位（参见 Boyer-Moore 算法一章）对于小字母表不是很有效，但是当字母表与模式的长度相比较大时，通常情况下在文本编辑器下进行 ASCII 表和普通搜索，它变得非常有用。在实践中单独使用它会产生非常有效的算法。Horspool 建议仅使用窗口最右边字符的坏字符移位来计算 Boyer-Moore 算法中的移位。

预处理阶段的时间复杂度为 O(m+sigma)，空间复杂度为 O(sigma)。

搜索阶段具有二次最坏情况，但可以证明一个文本字符的平均比较次数在 1/sigma 和 2/(sigma+1) 之间。

algorithm - 原始 Boyer-Moore 算法和 Boyer-Moore-Horspool 算法之间的区别

2 回答 2

Related

Reference