Skip to content

Quicksort Analysis

Updated: at 09:12 AM

An analysis of the runtime of quicksort

Table of Contents

Open Table of Contents


	if hi <= lo then
		return A
	pIndex <- random(index of A)
	loc <- partition(A, lo, hi, pIndex)
	QS(A, lo, loc-1)
	QS(A, loc+1, hi)

Partition(A, lo, hi, pIndex)
	pivot <- A[pIndex]
	swap(A, hi, pIndex)
	left <- low
	right <- hi-1
	while left <= right do
		if A[left] <= pivot then
			swap(A, left, right)

Swap(A, a, b) //trivial function, swap values at indices a and b in the array

This is the algorithm for quicksort, a sorting algorithm often considered the gold standard, especially for primitive types. It runs in $O(nlog(n))$ time. There does exist a lower bound proof that proves that any comparison based sort cannot be done in less than $O(nlog(n))$ time.

Technically, the running time of this algorithm is $O(n^2)$. However, in practice, this is very unlikely.

The way this algorithm approaches the sorting problem as such. Take some element $k$ in the array(called the pivot). Clearly, there will be $a$ items that are less than it(and appear before it in the array), and $b$ items that are greater than it(and appear after it). This means $a+b+1=len(A)$ Because of this, we can essentially begin by doing two things:

With this done, you ran run quicksort recursively on the first $a$ elements and the second $b$ elements.

Note that we select the pivot randomly. Originally, this pivot was selected simply as the middle element. However, in practice, there are a lot of elements that can make strong pivots. Therefore, it would do us better to select a random pivot, as it will reduce the runtime in expectation.

Runtime Proof

Given any array of size n, in expectation, the # of comparisons in Quicksort will be $\leq 2nln(n)+O(n) = \Theta(nlg(n))$

Proof: Let $y_{1}, y_{2},…y_{n}$ be the n elements of A in sorted order. Let us define some random variables.

Let’s make some observations based on the structure of our algorithm:

Now, to begin from the bottom, we see by LOE: $$ \Sigma_{k=1}^{n-1}X_{i, j}^k = X_{i, j}^k $$ Since the left is an indicator variable, this can be further simplified. $$ X_{i, j}^k=\Sigma_{k=1}^{n-1}Pr[X_{i,j}^k=1] $$ Now, let $t$ be the first call to partition where one of the elements between $y_i$ and $y_j$ are chosen as the pivot.

We can now perform expected value calculations.

$$ E[X] = \Sigma^{n-1}_{i=1} \Sigma^n_{j=1}E[X_{i, j}] $$

Looking at the casework from before, we see the only way $y_i$ and $y_j$ are compared is that if one of them is chosen as the pivot. Since the pivot is chosen randomly between $i$ and $j$, we get an probability and expected value of $\frac{2}{j-i+1}$

$$ E[X] = \Sigma^{n-1}_{i=1}\Sigma^{n}_{j=1}\frac{2}{j-i+1} $$

If we further simplify this, using the harmonic function and some algebra, we get a final expected relational recurrence of $T(n) = 2(n+1)ln(n)+2c(n+1)-4n$, which by dominating terms, results in a runtime bound of $\Theta(nlogn)$.