An Empirical Comparison of Sorting Algorithms¶
1. An Empirical Comparison of Sorting Algorithms¶
Which sorting algorithm is fastest? Asymptotic complexity analysis lets us distinguish between \(\Theta(n^2)\) and \(\Theta(n \log n)\) algorithms, but it does not help distinguish between algorithms with the same asymptotic complexity. Nor does asymptotic analysis say anything about which algorithm is best for sorting small lists. For answers to these questions, we can turn to empirical testing.
Table 1
Empirical comparison of sorting algorithms run on a 3.4 GHz Intel Pentium 4 CPU running Linux. All times shown are milliseconds.
Table 1 shows timing results for actual implementations of the sorting algorithms presented in this chapter. The algorithms compared include Insertion Sort, Bubble Sort, Selection Sort, Shellsort, Quicksort, Mergesort, Heapsort, Radix Sort.
Shellsort compares times for both the basic version and a version with increments based on division by three. Mergesort compares both the basic array-based implementation and an optimized version (which includes calls to Insertion Sort for lists of length below nine). For Quicksort, two versions are compared: the basic implementation and an optimized version that does not partition sublists below length nine (with Insertion Sort performed at the end). The first Heapsort version uses a standard class definition with methods to implement access functions like "parent". The second version removes all the method definitions and operates directly on the array using inlined code for all access functions.
Except for the rightmost columns, the input to each algorithm is a random array of integers. This affects the timing for some of the sorting algorithms. For example, Selection Sort is not being used to best advantage because the record size is small, so it does not get the best possible showing. The Radix Sort implementation certainly takes advantage of this key range in that it does not look at more digits than necessary. On the other hand, it was not optimized to use bit shifting instead of division, even though the bases used would permit this.
The various sorting algorithms are shown for lists of sizes 10, 100, 1000, 10,000, 100,000, and 1,000,000. The final two columns of each table show the performance for the algorithms on inputs of size 10,000 where the numbers are in ascending (sorted) and descending (reverse sorted) order, respectively. These columns demonstrate best-case performance for some algorithms and worst-case performance for others. They also show that for some algorithms, the order of input has little effect.
These figures show a number of interesting results. As expected, the \(O(n^2)\) sorts are quite poor performers for large arrays. Insertion Sort is by far the best of this group, unless the array is already reverse sorted. Shellsort is clearly superior to any of these \(O(n^2)\) sorts for lists of even 100 records. Optimized Quicksort is clearly the best overall algorithm for all but lists of 10 records. Even for small arrays, optimized Quicksort performs well because it does one partition step before calling Insertion Sort. Compared to the other \(O(n \log n)\) sorts, unoptimized Heapsort is quite slow due to the overhead of the class structure. When all of this is stripped away and the algorithm is implemented to manipulate an array directly, it is still somewhat slower than mergesort. In general, optimizing the various algorithms makes a noticeable improvement for larger array sizes.
Overall, Radix Sort is a surprisingly poor performer. If the code had been tuned to use bit shifting of the key value, it would likely improve substantially; but this would seriously limit the range of record types that the sort could support.
Here are a few multiple choice questions that ask you to compare the sorting algorithms that we learned about in this chapter.