27.2. Analyzing Search in Unsorted Arrays¶
You already know the simplest form of search: the sequential search algorithm. Sequential search on an unsorted list requires \(\Theta(n)\) time in the worst case.
How many comparisons does linear search do on average? A major consideration is whether \(K\) is in list L at all. We can simplify our analysis by ignoring everything about the input except the position of \(K\) if it is found in L. Thus, we have \(n+1\) distinct possible events: That \(K\) is in one of positions 0 to \(n-1\) in L (each position having its own probability), or that it is not in \(L\) at all. We can express the probability that \(K\) is not in L as
where \(\mathbf{P}(x)\) is the probability of event \(x\).
Let \(p_i\) be the probability that \(K\) is in position \(i\) of L (indexed from 0 to \(n-1\). For any position \(i\) in the list, we must look at \(i+1\) records to reach it. So we say that the cost when \(K\) is in position \(i\) is \(i+1\). When \(K\) is not in L, sequential search will require \(n\) comparisons. Let \(p_n\) be the probability that \(K\) is not in L. Then the average cost \(\mathbf{T}(n)\) will be
What happens to the equation if we assume all the \(p_i\) 's are equal (except \(p_n\))?
Depending on the value of \(p_n\), \(\frac{n+1}{2} \leq \mathbf{T}(n) \leq n\).