State Space Lower Bounds Proofs¶
We now consider the problem of finding both the minimum and the maximum from an (unsorted) list of values. This might be useful if we want to know the range of a collection of values to be plotted, for the purpose of drawing the plot’s scales. Of course we could find them independently in \(2n-2\) comparisons. A slight modification is to find the maximum in \(n-1\) comparisons, remove it from the list, and then find the minimum in \(n-2\) further comparisons for a total of \(2n-3\) comparisons. Can we do better than this?
Before continuing, think a moment about how this problem of finding the minimum and the maximum compares to the problem of the last section, that of finding the second biggest value (and by implication, the maximum). Which of these two problems do you think is harder? It is probably not at all obvious to you that one problem is harder or easier than the other. There is intuition that argues for either case. On the one hand intuition might argue that the process of finding the maximum should tell you something about the second biggest value, more than that process should tell you about the minimum value. On the other hand, any given comparison tells you something about which of two can be a candidate for maximum value, and which can be a candidate for minimum value, thus making progress in both directions.
We will start by considering a simple divide-and-conquer approach to finding the minimum and maximum. Split the list into two parts and find the minimum and maximum elements in each part. Then compare the two minimums and maximums to each other with a further two comparisons to get the final result. The algorithm is as follows:
// Return the minimum and maximum values in A between positions l and r
void MinMax(int A[], int l, int r, int Out[]) {
if (l == r) { // n=1
Out[0] = A[r];
Out[1] = A[r];
}
else if (l+1 == r) { // n=2
Out[0] = Math.min(A[l], A[r]);
Out[1] = Math.max(A[l], A[r]);
}
else { // n>2
int[] Out1 = new int[2];
int[] Out2 = new int[2];
int mid = (l + r)/2;
MinMax(A, l, mid, Out1);
MinMax(A, mid+1, r, Out2);
Out[0] = Math.min(Out1[0], Out2[0]);
Out[1] = Math.max(Out1[1], Out2[1]);
}
}
The cost of this algorithm can be modeled by the following recurrence.
This is a rather interesting recurrence, and its solution ranges between \(3n/2 - 2\) (when \(n = 2^i\) or \(n=2^1 \pm 1\)) and \(5n/3 - 2\) (when \(n = 3 \times 2^i\)). We can infer from this behavior that how we divide the list affects the performance of the algorithm. For example, what if we have six items in the list? If we break the list into two sublists of three elements, the cost would be 8. If we break the list into a sublist of size two and another of size four, then the cost would only be 7.
With divide and conquer, the best algorithm is the one that minimizes the work, not necessarily the one that balances the input sizes. One lesson to learn from this example is that it can be important to pay attention to what happens for small sizes of \(n\), because any division of the list will eventually produce many small lists.
We can model all possible divide-and-conquer strategies for this problem with the following recurrence.
That is, we want to find a way to break up the list that will minimize the total work. If we examine various ways of breaking up small lists, we will eventually recognize that breaking the list into a sublist of size 2 and a sublist of size (n-2) will always produce results as good as any other division. This strategy yields the following recurrence.
This recurrence (and the corresponding algorithm) yields \(\mathbf{T}(n) = \lceil 3n/2 \rceil - 2\) comparisons. Is this optimal? We now introduce yet another tool to our collection of lower bounds proof techniques: The state space proof.
We will model our algorithm by defining a state that the algorithm must be in at any given instant. We can then define the start state, the end state, and the transitions between states that any algorithm can support. From this, we will reason about the minimum number of states that the algorithm must go through to get from the start to the end, to reach a state space lower bound.
At any given instant, we can track the following four categories of elements:
Untested: Elements that have not been tested.
Winners: Elements that have won at least once, and never lost.
Losers: Elements that have lost at least once, and never won.
Middle: Elements that have both won and lost at least once.
We define the current state to be a vector of four values, \((U, W, L, M)\) for untested, winners, losers, and middles, respectively. For a set of \(n\) elements, the initial state of the algorithm is \((n, 0, 0, 0)\) and the end state is \((0, 1, 1, n-2)\). Thus, every run for any algorithm must go from state \((n, 0, 0, 0)\) to state \((0, 1, 1, n-2)\). We also observe that once an element is identified to be a middle, it can then be ignored because it can neither be the minimum nor the maximum.
Given that there are four types of elements, there are 10 types of comparison. Comparing with a middle cannot be more efficient than other comparisons, so we should ignore those, leaving six comparisons of interest. We can enumerate the effects of each comparison type as follows. If we are in state \((i, j, k, l)\) and we have a comparison, then the state changes are as follows.
Now, let us consider what an adversary will do for the various comparisons. The adversary will make sure that each comparison does the least possible amount of work in taking the algorithm toward the goal state. For example, comparing a winner to a loser is of no value because the worst case result is always to learn nothing new (the winner remains a winner and the loser remains a loser). Thus, only the following five transitions are of interest:
Only the last two transition types increase the number of middles, so there must be \(n-2\) of these. The number of untested elements must go to 0, and the first transition is the most efficient way to do this. Thus, \(\lceil n/2 \rceil\) of these are required. Our conclusion is that the minimum possible number of transitions (comparisons) is \(n + \lceil n/2 \rceil - 2\). Thus, our algorithm is optimal.