33.7. Rabin-Karp String Search Algorithm [Draft]¶
33.7.1. Rabin-Karp String Search Algorithm [Draft]¶
The Rabin-Karp algorithm is based on what could be imagined as a "perfect hash function for strings". We will assume that our strings are drawn from an alphabet with \(C\) possible characters. Denote the characters in string \(S\) by \(s_0, s_1, \ldots s_{n-1}\). Suppose that we have a mapping \(c \rightarrow \hat{c}\) that associates with each character \(c\) an integer \(\hat{c}\) in the range \(0 \ldots c - 1\). Then a "perfect hash function for strings" is:
Suppose that we call this a string's "magic number". In effect it associates each string with a unique number in the base \(C\) number system. However, nothing is perfect -- these magic numbers for strings get very big very quickly. Hence the following sub-algorithm of Rabin-Karp to compute a string's magic number (which is itself known as Horner's polynomial evaluation algorithm) takes this into account by using the \(mod\) operator to avoid an overflow condition.
Slideshow for Horner's Method algorithm for computing Rabin-Karp "magic number" for a string
To check your understanding of this "magic number" computation try the following exercise in using Horner's Method to compute a string's "magic number" in a simple case
Because Horner's Method cannot truly compute a magic number that is unique for every string, the Rabin-Karp algorithm must allow for two different strings having the same magic number. In effect, such a situation represents a "false positive" in which Rabin-Karp thinks it has found a match only to be disappointed. Watch Rabin-Karp in action in the following slideshow.
Finally try this exercise in tracing one step of the Rabin-Karp algorithm using the modified Horner's algorithm to compute the "magic number" of a string.