algorithm - Rabin Karp 算法 - 给定输入的最坏情况 O(m*n) 如何？

Question

在 RK 算法的 Top Coder 代码中：

// correctly calculates a mod b even if a < 0
function int_mod(int a, int b)
{
  return (a % b + b) % b;
}

function Rabin_Karp(text[], pattern[])
{
  // let n be the size of the text, m the size of the
  // pattern, B - the base of the numeral system,
  // and M - a big enough prime number

  if(n < m) return; // no match is possible

  // calculate the hash value of the pattern
  hp = 0;
  for(i = 0; i < m; i++) 
    hp = int_mod(hp * B + pattern[i], M);

  // calculate the hash value of the first segment 
  // of the text of length m
  ht = 0;
  for(i = 0; i < m; i++) 
    ht = int_mod(ht * B + text[i], M);

  if(ht == hp) check character by character if the first
               segment of the text matches the pattern;

  // start the "rolling hash" - for every next character in
  // the text calculate the hash value of the new segment
  // of length m; E = (Bm-1) modulo M            
  for(i = m; i < n; i++) {
    ht = int_mod(ht - int_mod(text[i - m] * E, M), M);
    ht = int_mod(ht * B, M);
    ht = int_mod(ht + text[i], M);

    if(ht == hp) check character by character if the
                 current segment of the text matches
                 the pattern; 
  }
}

上面写着

不幸的是，仍有一些情况下，我们必须为文本中的每个起始位置运行“naive”方法的整个内部循环——例如，在字符串“aaaaaaaaaaaaaaaaaaaaaaaa”中搜索模式“aaa”时——所以在在最坏的情况下，我们仍然需要 (n * m) 次迭代。

但是算法不会在第一次迭代本身就停止 - 就像它会看到前三个字母是与针匹配的“a”一样？

score 1 · Accepted Answer

Rabin-Karp 算法不断计算所有text大小为的子串的哈希值M，并将其与的哈希值匹配pattern。现在，可以有多个具有相同哈希值的子字符串。

因此，当pattern和某些子字符串的哈希值text匹配时，我们需要再次迭代它们以确保它们实际上是否相同。

在和的情况下pattern = "AAA"，text = "AAAAAAAAAAAAA"存在O(n)与的哈希值匹配的子字符串pattern。而对于每一场比赛，我们都需要迭代来及时确认O(m)；因此最坏情况的复杂性O(n*m)。

score 1 · Accepted Answer

假设我们要搜索的字符串不是“aaa”，而是某个其他字符串，其哈希值与“aaa”的哈希值相同。然后在每个点都需要进行比较。

当然，我们希望比较比m字符更早失败，但它可能需要 o(m) 个字符。

话虽如此，RK 的一个常见用途是查找所有（重叠）实例，在这种情况下，引用的示例显然是 o(mn)。

algorithm - Rabin Karp 算法 - 给定输入的最坏情况 O(m*n) 如何？

2 回答 2

Related

Reference