Correcting one word spelling mistakes (both non-word & real-word mistakes) is easy:
P(w|c) P(c)
Where w
is the incorrectly spelled word and c
is the candidate we're trying to match, such that the candidate is a one word token.
But in Google, when you enter something like spelligncheck
, it corrects the word into two different words. Now, P(w|c)
is easy here, if i use levenshtein distance. But that means i can't have one word (one token, rather) candidates anymore. So this will increase the size of my dictionary exponentially.
Moreover when I enter app le
Google corrects it to apple
...
So what is the best way of doing multiple word spelling correction, given a one-token dictionary?