Sorting Algorithms

Sneiderman, Robby

Algorithms Foundations

Sorting Algorithms

Comparison-based sorting lower bound, quicksort, mergesort, heapsort, and non-comparison sorts. The foundational algorithms behind efficient data processing and search.

CoreTier 2StableSupporting~35 min

Prereq Map

Why This Matters

Sorting is the most fundamental algorithmic primitive. If you can sort, you can search (binary search), find duplicates, compute order statistics, and build efficient data structures. In machine learning, sorting appears in data preprocessing, nearest neighbor search, building decision trees (sorting features by split quality), and computing rank-based metrics like AUC.

The comparison-based sorting lower bound is the cleanest example of a computational lower bound: a proof that no algorithm (no matter how clever) can do better than $\Omega(n \log n)$ comparisons. This style of argument, information-theoretic lower bounds, appears throughout theoretical CS and ML.

Mental Model

You have $n$ items in an unknown order. You can compare pairs of items ("is $a < b$ ?") and rearrange them. How many comparisons are needed?

There are $n!$ possible orderings. Each comparison eliminates at most half the remaining possibilities (it answers a yes/no question). So you need at least $\log_2(n!)$ comparisons to identify the correct ordering.

The Lower Bound

Theorem

Comparison-Based Sorting Lower Bound

Statement

Any comparison-based sorting algorithm requires at least $\lceil \log_2(n!) \rceil$ comparisons in the worst case. By Stirling's approximation:

$\log_2(n!) = n \log_2 n - n \log_2 e + O(\log n) = \Theta(n \log n)$

Therefore, no comparison-based sorting algorithm can run in $o(n \log n)$ worst-case time.

Intuition

Model the algorithm as a binary decision tree. Each internal node is a comparison, each leaf is a permutation (output ordering). The tree must have at least $n!$ leaves (one for each possible input ordering). A binary tree with $n!$ leaves has height at least $\log_2(n!)$ . The height is the worst-case number of comparisons.

Proof Sketch

Any deterministic comparison-based algorithm on $n$ elements defines a binary decision tree: at each node, compare two elements and branch left or right. Each leaf corresponds to a specific output permutation. For the algorithm to be correct, every input permutation must reach a leaf with the correct sorted output, so there must be at least $n!$ reachable leaves. A binary tree of height $h$ has at most $2^h$ leaves, so $2^h \geq n!$ and $h \geq \log_2(n!)$ .

Why It Matters

This is the cleanest example of a computational lower bound via an information-theoretic argument. The technique. counting distinguishable inputs and bounding the information gained per operation. applies far beyond sorting. It shows that quicksort, mergesort, and heapsort are all optimal (up to constant factors) among comparison-based sorts.

Failure Mode

The bound only applies to comparison-based algorithms. If you know extra structure about the input (e.g., all elements are integers in a known range), you can beat $\Omega(n \log n)$ using counting sort or radix sort.

report a correction →

Comparison-Based Sorts

Quicksort

Algorithm: Choose a pivot element. Partition the array into elements less than the pivot and elements greater than the pivot. Recurse on both halves.

Theorem

Quicksort Expected Runtime

Statement

Quicksort with uniformly random pivot selection runs in $O(n \log n)$ expected time and $O(n^2)$ worst-case time. The expected number of comparisons is $2n \ln n + O(n)$ .

Intuition

A random pivot splits the array into two parts. On average, the split is reasonably balanced (not necessarily exactly half, but good enough). Balanced splits give $O(\log n)$ levels of recursion, each doing $O(n)$ work. The worst case ( $O(n^2)$ ) occurs when every pivot is the minimum or maximum, giving $n$ levels of recursion. But this is exponentially unlikely with random pivots.

Proof Sketch

Let $X_{ij}$ be the indicator that elements $i$ and $j$ (in sorted order) are compared. Elements $i$ and $j$ are compared if and only if one of them is chosen as a pivot before any element between them. This happens with probability $2/(j - i + 1)$ . The expected number of comparisons is:

$\mathbb{E}\left[\sum_{i < j} X_{ij}\right] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} \frac{2}{j - i + 1} = 2n H_n - O(n) = 2n \ln n + O(n)$

where $H_n$ is the $n$ -th harmonic number.

Why It Matters

Quicksort is the most practically efficient comparison sort due to excellent cache performance and small constant factors. Its in-place nature (no extra memory) makes it the default in most standard library sort implementations (with modifications like introsort to avoid worst-case behavior).

Failure Mode

Deterministic quicksort with a bad pivot rule (e.g., always first element) degrades to $O(n^2)$ on sorted or nearly-sorted input. exactly the case that occurs most often in practice. Always use random or median-of-three pivots.

report a correction →

Properties: In-place (O(log n) stack space), not stable, $O(n \log n)$ expected, $O(n^2)$ worst case.

Mergesort

Algorithm: Split the array in half. Recursively sort each half. Merge the two sorted halves in $O(n)$ time.

Properties: Not in-place ( $O(n)$ extra space), stable, $O(n \log n)$ guaranteed.

Why it matters: Mergesort is the canonical divide-and-conquer algorithm. Its guaranteed $O(n \log n)$ runtime (no worst case) makes it preferable when predictability matters. The stability property (equal elements maintain their relative order) is important for multi-key sorting.

Heapsort

Algorithm: Build a max-heap from the array in $O(n)$ time. Repeatedly extract the maximum and place it at the end.

Properties: In-place, not stable, $O(n \log n)$ guaranteed.

Why it matters: Heapsort gives the best of both worlds: in-place like quicksort and guaranteed $O(n \log n)$ like mergesort. However, it has poor cache locality in practice, making it slower than quicksort despite the better worst case.

Summary of Comparison Sorts

Algorithm	Best Case	Average Case	Worst Case	Space	Stable?
Quicksort	$O(n \log n)$	$O(n \log n)$	$O(n^2)$	$O(\log n)$	No
Mergesort	$O(n \log n)$	$O(n \log n)$	$O(n \log n)$	$O(n)$	Yes
Heapsort	$O(n \log n)$	$O(n \log n)$	$O(n \log n)$	$O(1)$	No

Non-Comparison Sorts

These bypass the $\Omega(n \log n)$ lower bound by exploiting structure in the input (integers in a known range).

Counting Sort

Input: $n$ integers in range $[0, k)$ . Time: $O(n + k)$ . Space: $O(k)$ . Stable: Yes.

Algorithm: Count occurrences of each value, then compute prefix sums to determine positions. This is linear when $k = O(n)$ .

Radix Sort

Input: $n$ integers with $d$ digits in base $b$ . Time: $O(d(n + b))$ . Space: $O(n + b)$ . Stable: Yes.

Algorithm: Sort by least significant digit first, then by next digit, etc., using a stable sort (counting sort) at each level. For $n$ integers in range $[0, n^c)$ , choose $b = n$ and get $d = c$ passes, giving $O(cn)$ time, linear for constant $c$ .

Hybrid Sorts in Practice

Production sort routines are not pure quicksort or mergesort. They are hybrids that switch strategies based on input properties:

Introsort (Musser 1997) starts as quicksort and switches to heapsort once recursion depth exceeds $2 \log_2 n$ . This bounds the worst case at $O(n \log n)$ while keeping quicksort's average-case constants. Used in the C++ STL std::sort since GCC 4.x and in Microsoft's STL.
Timsort (Peters 2002) is a stable mergesort variant that detects pre-existing runs of sorted (ascending or descending) elements and merges them, giving $O(n)$ on already-sorted or nearly-sorted input. It is the default sort in CPython's list.sort and Java's Arrays.sort for object arrays. Auger, Jugé, Nicaud, Pivoteau (2018) gave a tight analysis of its comparison count.
pdqsort (Peters 2016, "Pattern-Defeating Quicksort") combines introsort with branchless partitioning and pattern detection for ascending/descending/equal runs. Standard since Rust 1.20 (slice::sort_unstable) and Boost's pdqsort.

These hybrids matter because real workloads are not adversarial: input often has partial structure (already-sorted prefixes, many duplicates, small ranges), and the algorithm that exploits that structure wins.

Why ML People Should Care

Data preprocessing: Sorting is the first step in many data pipelines. Efficient sorting of millions of feature values is essential for decision tree training (finding optimal splits) and quantile computation. It is also fundamental to model evaluation workflows.

Nearest neighbor search: Many approximate nearest neighbor algorithms rely on sorting distances. KD-trees partition data along sorted dimensions.

Rank-based metrics: AUC (area under ROC curve) requires sorting predictions by score. Rank correlation (Kendall's tau, Spearman's rho) is defined in terms of sorted orders.

Efficient implementations: Understanding sorting helps you write faster code. Knowing that your data is nearly sorted (use insertion sort or Timsort), that your keys are integers (use radix sort), or that you need stability (use mergesort) can make orders-of-magnitude performance differences.

Common Confusions

Watch Out

O(n log n) is a lower bound, not just an upper bound

The $\Omega(n \log n)$ bound is a lower bound on comparison-based sorting. Quicksort, mergesort, and heapsort all achieve $O(n \log n)$ , matching the lower bound. This means they are optimal. You cannot do better with comparisons. Non-comparison sorts beat this bound by using more information than just pairwise comparisons.

Watch Out

In-place does not mean zero extra space

Quicksort is called in-place because it uses $O(\log n)$ stack space (for recursion), not $O(n)$ . Heapsort uses $O(1)$ extra space. "In-place" is relative to the input size, not literally zero.

Watch Out

Stability matters more than you think

A stable sort preserves the relative order of equal elements. This matters when you sort by multiple keys: sort by secondary key first, then by primary key (using a stable sort). If the sort is not stable, the secondary ordering is destroyed.

Summary

Comparison-based lower bound: $\Omega(n \log n)$ via decision tree argument
Quicksort: $O(n \log n)$ expected, $O(n^2)$ worst, in-place, not stable
Mergesort: $O(n \log n)$ guaranteed, $O(n)$ space, stable
Heapsort: $O(n \log n)$ guaranteed, in-place, not stable
Non-comparison sorts (counting, radix): $O(n)$ when key range is manageable
The lower bound argument (counting leaves in a decision tree) is a fundamental technique in theoretical CS

Exercises

ExerciseCore

Problem

Using the decision tree argument, what is the minimum number of comparisons needed to sort 5 elements? Compare this to the number used by mergesort on 5 elements.

ExerciseAdvanced

Problem

Prove that quicksort's worst case is $\Omega(n^2)$ by constructing an input where deterministic quicksort (with the first element as pivot) makes $\Theta(n^2)$ comparisons.

ExerciseAdvanced

Problem

You have $n = 10^6$ integers in the range $[0, 10^6)$ . Which sorting algorithm should you use and what is its runtime? What if the range is $[0, 10^{18})$ ?

References

Canonical:

Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapters 7-9
Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching

Current:

Sedgewick & Wayne, Algorithms (4th ed.), Chapters 2.1-2.5
Musser, "Introspective Sorting and Selection Algorithms" (Software: Practice and Experience 1997). The introsort recursion-depth-cap construction adopted by the C++ STL.
Auger, Jugé, Nicaud, Pivoteau, "On the Worst-Case Complexity of TimSort" (ESA 2018). Tight bounds on Timsort's comparison count, including the bug that motivated the analysis.
Peters, "Pattern-Defeating Quicksort" (2016), arXiv:2106.05123 (later writeup). Branchless partitioning and adversarial-input handling that drives slice::sort_unstable in Rust.

Next Topics

The natural next steps from sorting:

Binary search: the key application of sorted data
Order statistics: finding the k-th smallest element in $O(n)$ time
Hash tables: when you need faster-than-sorting lookups

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

0

No direct prerequisites are declared; this is treated as an entry point.

Derived topics

0

No published topic currently declares this as a prerequisite.