27.2. Analyzing Search in Unsorted Arrays¶

You already know the simplest form of search: the sequential search algorithm. Sequential search on an unsorted list requires $\Theta(n)$ time in the worst case.

How many comparisons does linear search do on average? A major consideration is whether $K$ is in list L at all. We can simplify our analysis by ignoring everything about the input except the position of $K$ if it is found in L. Thus, we have $n+1$ distinct possible events: That $K$ is in one of positions 0 to $n-1$ in L (each position having its own probability), or that it is not in $L$ at all. We can express the probability that $K$ is not in L as

$\mathbf{P}(K \notin \mathbf{L}) = 1 - \sum_{i=1}^n \mathbf{P}(K = \mathbf{L}[i])$

where $\mathbf{P}(x)$ is the probability of event $x$ .

Let $p_i$ be the probability that $K$ is in position $i$ of L (indexed from 0 to $n-1$ . For any position $i$ in the list, we must look at $i+1$ records to reach it. So we say that the cost when $K$ is in position $i$ is $i+1$ . When $K$ is not in L, sequential search will require $n$ comparisons. Let $p_n$ be the probability that $K$ is not in L. Then the average cost $\mathbf{T}(n)$ will be

$\mathbf{T}(n) = n p_n + \sum_{i=0}^{n-1} (i+1) p_i.$

What happens to the equation if we assume all the $p_i$ 's are equal (except $p_n$ )?

$\begin{split}\mathbf{T}(n) &=& p_n n + \sum_{i=0}^{n-1} (i+1) p\\ &=& p_n n + p\sum_{i=1}^n i\\ &=& p_n n + p\frac{n(n+1)}{2}\\ &=& p_n n + \frac{1 - p_n}{n}\frac{n(n+1)}{2}\\ &=& \frac{n + 1 + p_n(n-1)}{2}\end{split}$

Depending on the value of $p_n$ , $\frac{n+1}{2} \leq \mathbf{T}(n) \leq n$ .

Chapter 27 Searching

27.2. Analyzing Search in Unsorted Arrays¶