by, Subhajit Gorai
22/11/2025

Heap vs Sorted Array: Why Building a Heap is Linear

One of the most surprising results in algorithm analysis is that building a heap takes only O(n)O(n) time, while sorting an array requires Ω(nlogn)\Omega(n \log n) comparisons. This isn't just a lucky optimization—it's a fundamental consequence of how many valid arrangements exist for each structure. Let's explore why.

The Sorted Array: Only One Way

For nn distinct elements, there is exactly one valid sorted arrangement. Out of all n!n! possible permutations, only one satisfies the sorted order. The probability of randomly arriving at this configuration is:

P(sorted)=1n!P(\text{sorted}) = \frac{1}{n!}

Why Ω(nlogn)\Omega(n \log n) Comparisons Are Required

To find this single correct arrangement through comparisons, we can model the sorting process as a decision tree. Each comparison eliminates some permutations from consideration:

sh
Initial state: n! possible permutations remain Compare a[0] vs a[1] / \ a[0]<a[1] a[1]<a[0] / \ n!/2 remain n!/2 remain | | [more comparisons] [more comparisons] | | v v Eventually: 1 permutation remains (sorted)

Each comparison can at best halve the number of remaining possibilities (it's a binary decision). To narrow down from n!n! possibilities to 1, we need at least:

log2(n!) comparisons\log_2(n!) \text{ comparisons}

Using Stirling's approximation:

log2(n!)nlog2(n)nlog2(e)Θ(nlogn)\log_2(n!) \approx n \log_2(n) - n \log_2(e) \in \Theta(n \log n)

Let's visualize this with a concrete example for n=4n=4 elements {1,2,3,4}\{1, 2, 3, 4\}:

sh
All 24 permutations: [1234, 1243, 1324, 1342, 1423, 1432, 2134, 2143, 2314, 2341, 2413, 2431, 3124, 3142, 3214, 3241, 3412, 3421, 4123, 4132, 4213, 4231, 4312, 4321] After "compare 1st vs 2nd": - If 1st < 2nd: eliminates ~12 permutations - If 1st > 2nd: eliminates ~12 permutations After several more comparisons: ...eventually converges to: [1234] ← The only valid sorted output

The decision tree has depth Ω(log2(24))Ω(log2(4!))Ω(4log4)\Omega(\log_2(24)) \approx \Omega(\log_2(4!)) \approx \Omega(4 \log 4).

Key insight: Because there's only one target out of n!n! possibilities, we need Ω(nlogn)\Omega(n \log n) comparisons to identify it.

The Heap: Many Valid Arrangements

A max-heap satisfies the heap property: every parent is greater than or equal to its children. However, unlike a sorted array, many different arrangements can satisfy this property for the same set of elements.

Consider the array {5,3,4,1,2}\{5, 3, 4, 1, 2\} as a max-heap:

sh
Original heap: 5 / \ 3 4 / \ 1 2

We can swap the left and right children at any node and still maintain the heap property:

sh
Swapped at root: 5 / \ 4 3 ← Swapped 3 and 4 / \ 1 2 Swapped at left child: 5 / \ 3 4 / \ 2 1 ← Swapped 1 and 2 Multiple swaps: 5 / \ 4 3 / \ 2 1 ← Both levels swapped

All of these are valid max-heaps! This flexibility is the key: there are many more valid heap configurations than there is a single sorted arrangement.

Counting the Number of Heaps: H(n)H(n)

Let's derive exactly how many distinct max-heap structures can be formed from nn distinct elements.

Recurrence Relation

For a heap of size nn with root element (the maximum), we partition the remaining n1n-1 elements into:

  • A left subtree of size LL
  • A right subtree of size R=n1LR = n - 1 - L

The left subtree can be any valid heap of size LL, giving H(L)H(L) possibilities. Similarly, the right subtree has H(R)H(R) possibilities.

But which elements go left vs right? For a complete binary tree of size nn, the size LL of the left subtree is determined by the tree shape. However, we can choose any LL elements from the remaining n1n-1 elements to place in the left subtree. That's (n1L)\binom{n-1}{L} ways.

This gives the recurrence:

H(n)=(n1L)H(L)H(R)H(n) = \binom{n-1}{L} \cdot H(L) \cdot H(R)

where LL and RR are determined by the complete binary tree structure (left subtree as full as possible), and H(1)=1H(1) = 1.

Exact Formula

There's an elegant closed form for H(n)H(n):

H(n)=n!vs(v)H(n) = \frac{n!}{\prod_{v} s(v)}

where the product is over all nodes vv in the heap tree, and s(v)s(v) is the size of the subtree rooted at node vv.

Why does this work? The numerator n!n! represents all possible arrangements of nn elements. The denominator counts the "over-counting" due to heap symmetries. Each subtree of size s(v)s(v) has internal orderings that don't matter for the heap property, so we divide by s(v)s(v) to remove those redundant arrangements.

Small Checks

Let's verify this formula for small values:

n=1n = 1:

Heap:  [1]

H(1)=1!1=1H(1) = \frac{1!}{1} = 1

n=2n = 2:

sh
Heap: 2 | 1

Subtree sizes: root has s=2s=2, left child has s=1s=1
H(2)=2!21=22=1H(2) = \frac{2!}{2 \cdot 1} = \frac{2}{2} = 1

n=3n = 3:

sh
Heap: 3 / \ 1 2 (or swap: 2 and 1)

Subtree sizes: root has s=3s=3, both children have s=1s=1 each
H(3)=3!311=63=2H(3) = \frac{3!}{3 \cdot 1 \cdot 1} = \frac{6}{3} = 2

Using recurrence: H(3)=(21)H(1)H(1)=211=2H(3) = \binom{2}{1} \cdot H(1) \cdot H(1) = 2 \cdot 1 \cdot 1 = 2

n=4n = 4:

sh
Possible heaps: 4 4 4 / \ / \ / \ 3 2 3 1 2 3 ... / / / 1 2 1 And more variations...

L=2,R=1L = 2, R = 1 for a complete tree of size 4
Subtree sizes: root=44, left child=22, left-left child=11, right child=11
H(4)=4!4211=248=3H(4) = \frac{4!}{4 \cdot 2 \cdot 1 \cdot 1} = \frac{24}{8} = 3

Using recurrence: H(4)=(32)H(2)H(1)=311=3H(4) = \binom{3}{2} \cdot H(2) \cdot H(1) = 3 \cdot 1 \cdot 1 = 3

Asymptotic Analysis: H(n)n!2Θ(n)H(n) \approx \frac{n!}{2^{\Theta(n)}}

To understand how H(n)H(n) grows, let's analyze the denominator vs(v)\prod_{v} s(v).

In a complete binary tree:

  • There are Θ(logn)\Theta(\log n) levels
  • Level ii has 2i2^i nodes (for i=0,1,...,logn1i = 0, 1, ..., \log n - 1)
  • Nodes at level ii have subtrees of size Θ(n/2i)\Theta(n / 2^i)

The product becomes:

vs(v)i=0logn1(n2i)2i\prod_{v} s(v) \approx \prod_{i=0}^{\log n - 1} \left(\frac{n}{2^i}\right)^{2^i}

Taking logarithms:

logvs(v)=i=0logn12ilog(n2i)\log \prod_{v} s(v) = \sum_{i=0}^{\log n - 1} 2^i \log\left(\frac{n}{2^i}\right)

=i=0logn12i(logni)= \sum_{i=0}^{\log n - 1} 2^i (\log n - i)

The dominant terms come from large ii, where 2in2^i \approx n and we have Θ(n)\Theta(n) terms overall. Working through the algebra:

logvs(v)Θ(nlogn)\log \prod_{v} s(v) \in \Theta(n \log n)

But more precisely, the sum evaluates to approximately nlogncnn \log n - cn for some constant cc, meaning:

vs(v)nn2Θ(n)\prod_{v} s(v) \approx \frac{n^n}{2^{\Theta(n)}}

Therefore:

H(n)=n!vs(v)n!2Θ(n)H(n) = \frac{n!}{\prod_{v} s(v)} \approx \frac{n!}{2^{\Theta(n)}}

Using Stirling's approximation n!2πn(ne)nn! \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n, we get:

H(n)2πn(n/e)n2Θ(n)H(n) \approx \frac{\sqrt{2\pi n} (n/e)^n}{2^{\Theta(n)}}

Key observation: While n!=2Θ(nlogn)n! = 2^{\Theta(n \log n)}, we have H(n)=2Θ(nlogn)Θ(n)=2Θ(nlogn)H(n) = 2^{\Theta(n \log n) - \Theta(n)} = 2^{\Theta(n \log n)} but with a much smaller exponent than n!n!.

More specifically: log2H(n)=Θ(nlognn)n(logn1)\log_2 H(n) = \Theta(n \log n - n) \approx n(\log n - 1) compared to log2(n!)nlogn\log_2(n!) \approx n \log n.

Why Heap Construction is Linear

Now the punchline: because there are exponentially many valid heaps (roughly n!/2Θ(n)n!/2^{\Theta(n)}), the probability of building a valid heap is much higher:

P(valid heap)H(n)n!=12Θ(n)P(\text{valid heap}) \approx \frac{H(n)}{n!} = \frac{1}{2^{\Theta(n)}}

This is vastly larger than 1/n!1/n! for sorted arrays. In the decision tree model:

Comparisons neededlog2(n!H(n))=log2(n!)log2H(n)\text{Comparisons needed} \approx \log_2\left(\frac{n!}{H(n)}\right) = \log_2(n!) - \log_2 H(n)

nlogn(nlogncn)=Θ(n)\approx n \log n - (n \log n - cn) = \Theta(n)

Mathematical proof sketch: The standard heapify algorithm works bottom-up, with each level requiring work proportional to the number of nodes times their height. The total is:

h=0lognn2h+1h=nh=0lognh2h+1\sum_{h=0}^{\log n} \frac{n}{2^{h+1}} \cdot h = n \sum_{h=0}^{\log n} \frac{h}{2^{h+1}}

The sum h=0h2h+1\sum_{h=0}^{\infty} \frac{h}{2^{h+1}} converges to a constant (specifically, 1), so:

Total work=O(n)\text{Total work} = O(n)

The combinatorial argument reinforces this: we don't need to make Ω(nlogn)\Omega(n \log n) comparisons because we're not searching for one specific arrangement among n!n! possibilities. We're searching for any of the H(n)n!/2Θ(n)H(n) \approx n!/2^{\Theta(n)} valid heaps, which requires only Θ(n)\Theta(n) comparisons.

Conclusion

The fundamental difference:

  • Sorted array: 1 valid arrangement out of n!n! → requires Ω(nlogn)\Omega(n \log n) comparisons
  • Max-heap: H(n)n!/2Θ(n)H(n) \approx n!/2^{\Theta(n)} valid arrangements → requires only O(n)O(n) comparisons

This isn't just a clever algorithm optimization—it's a deep mathematical truth about the flexibility of heap structures versus the rigidity of sorted order. The heap property is local (parent \geq children), allowing many global configurations, while sorting is global (total order), allowing only one.


References:
leimao.github.io/blog/Heap-Building-Asymptotic-Algorithm
geeksforgeeks.org/dsa/number-ways-form-heap-n-distinct-integers

This article was written with help from Claude, in parts of Latex & some paragraphs

+
+
+
+