10.3. Collision Resolution¶
10.3.1. Collision Resolution¶
We now turn to the most commonly used form of hashing: closed hashing with no bucketing, and a collision resolution policy that can potentially use any slot in the hash table.
During insertion, the goal of collision resolution is to find a free slot in the hash table when the home position for the record is already occupied. We can view any collision resolution method as generating a sequence of hash table slots that can potentially hold the record. The first slot in the sequence will be the home position for the key. If the home position is occupied, then the collision resolution policy goes to the next slot in the sequence. If this is occupied as well, then another slot must be found, and so on. This sequence of slots is known as the probe sequence, and it is generated by some probe function that we will call p. Insertion works as follows:
// Insert e into hash table HT
void hashInsert(const Key& k, const Elem& e) {
int home; // Home position for e
int pos = home = h(k); // Init probe sequence
for (int i=1; EMPTYKEY != (HT[pos]).key(); i++) {
pos = (home + p(k, i)) % M; // probe
if (k == HT[pos].key()) {
println("Duplicates not allowed");
return;
}
}
HT[pos] = e;
}
Method hashInsert
first checks to see if the home slot for the
key is empty.
If the home slot is occupied, then we use the probe function
\(\textbf{p}(k, i)\) to locate a free slot in the table.
Function p has two parameters, the key \(k\) and a
count \(i\) of where in the probe sequence we wish to be.
That is, to get the first position in the probe sequence after the
home slot for key \(K\), we call \(\textbf{p}(K, 1)\).
For the next slot in the probe sequence, call \(\textbf{p}(K, 2)\).
Note that the probe function returns an offset from the original home
position, rather than a slot in the hash table.
Thus, the for
loop in hashInsert
is computing positions
in the table at each iteration by adding the value returned from the
probe function to the home position.
The \(i\) th call to p returns the \(i\) th offset to be used.
Searching in a hash table follows the same probe sequence that was followed when inserting records. In this way, a record not in its home position can be recovered. An implementation for the search procedure is as follows.:
// Search for the record with Key K
bool hashSearch(const Key& K, Elem& e) const {
int home; // Home position for K
int pos = home = h(K); // Initial position is the home slot
for (int i = 1;
(K != (HT[pos]).key()) && (EMPTYKEY != (HT[pos]).key());
i++)
pos = (home + p(K, i)) % M; // Next on probe sequence
if (K == (HT[pos]).key()) { // Found it
e = HT[pos];
return true;
}
else return false; // K not in hash table
}
Both the insert and the search routines assume that at least one slot on the probe sequence of every key will be empty. Otherwise they will continue in an infinite loop on unsuccessful searches. Thus, the hash system should keep a count of the number of records stored, and refuse to insert into a table that has only one free slot.
The simplest approach to collsion resolution is simply to move down the table from the home slot until a free slot is found. This is known as linear probing. The probe function for simple linear probing is \(\textbf{p}(K, i) = i\). That is, the \(i\) th offset on the probe sequence is just \(i\), meaning that the \(i\) th step is simply to move down \(i\) slots in the table. Once the bottom of the table is reached, the probe sequence wraps around to the beginning of the table (since the last step is to mod the result to the table size). Linear probing has the virtue that all slots in the table will be candidates for inserting a new record before the probe sequence returns to the home position.
Can you see any reason why this might not be the best approach to collision resolution?
10.3.1.1. The Problem with Linear Probing¶
While linear probing is probably the first idea that comes to mind when considering collision resolution policies, it is not the only one possible. Probe function p allows us many options for how to do collision resolution. In fact, linear probing is one of the worst collision resolution methods. The main problem is illustrated by the next slideshow.
Again, the ideal behavior for a collision resolution mechanism is that each empty slot in the table will have equal probability of receiving the next record inserted (assuming that every slot in the table has equal probability of being hashed to initially). This tendency of linear probing to cluster items together is known as primary clustering. Small clusters tend to merge into big clusters, making the problem worse. The objection to primary clustering is that it leads to long probe sequences.