I'm trying to find a substring from a string text
that is an anagram to a string pattern
.
My Question: Could the Rabin-Karp algorithm be adjusted to this purpose? Or are there better algorithms?
I have tried out a brute-force algorithm, which did not work in my case because the text and the pattern can each be up to one million characters.
Update: I've heard there is a worst-case O(n2) algorithm that uses O(1) space. Does anyone know what this algorithm is?
Update 2: For reference, here is pseudocode for the Rabin-Karp algorithm:
function RabinKarp(string s[1..n], string sub[1..m])
hsub := hash(sub[1..m]); hs := hash(s[1..m])
for i from 1 to n-m+1
if hs = hsub
if s[i..i+m-1] = sub
return i
hs := hash(s[i+1..i+m])
return not found
This uses a rolling hash function to allow calculating the new hash in O(1),
so the overall search is O(nm) in the worst-case, but with a good hash function is O(m + n) in the best case. Is there a rolling hash function that would produce few collisions
when searching for anagrams of the string?