Kth Largest in a Stream: why a size-k min-heap is the right boundary
The problem dresses itself up as a university admissions scenario, but strip that away and you have one recurring question: after every insertion into a growing sequence of integers, what is the kth largest value right now? That question has to be answerable fast — up to 10^4 calls to add — and the input keeps arriving, so you cannot defer work to the end.
The constraint that matters most is the one that sets the target: 1 <= k <= nums.length + 1. The problem guarantees that by the time you are asked to return the kth largest, there are always at least k elements in the stream. That means you never have to handle a situation where the answer does not exist.
The problem
Design a class KthLargest that, given an integer k and an initial array of integers nums, tracks a live stream of values and returns the kth largest element after every new addition. The kth largest is positional — duplicates count as separate entries. Full statement: LeetCode 703.
Constraints:
- 0 <= nums.length <= 10^4
- 1 <= k <= nums.length + 1
- -10^4 <= nums[i] <= 10^4
- -10^4 <= val <= 10^4
- At most 10^4 calls will be made to add.Constraints force the design before you pick a data structure
Start from nums.length <= 10^4 and at most 10^4 calls to add. If each add is O(n log n), the total work across all calls is on the order of 10^8 operations — that lands in the borderline zone and will almost certainly TLE on LeetCode's judge. You need each add to be at most O(log n). Actually, since you only care about k elements, O(log k) is even better.
The value range [-10^4, 10^4] is small and signed. No overflow concern. No sentinel tricks with 0. Comparisons are straightforward.
k can be 1, which means "always return the running maximum." It can also equal nums.length + 1 after the first add, which means "return the minimum of all elements so far." The data structure needs to handle both ends without special-casing.
The stream can contain duplicates. A score of 7 appearing four times is four distinct entries. The kth largest is a positional concept, not a distinctness concept.
Approach 1: sort on every add
The simplest reading of the problem is: keep all elements, sort them, return position k from the right. That is an O(1) constructor (just store k and the list) and an O(n log n) add.
from typing import List
class KthLargest:
def __init__(self, k: int, nums: List[int]):
self.k = k
self.nums = list(nums)
def add(self, val: int) -> int:
self.nums.append(val)
self.nums.sort(reverse=True)
return self.nums[self.k - 1]using System.Collections.Generic;
public class KthLargest {
private int k;
private List<int> nums;
public KthLargest(int k, int[] nums) {
this.k = k;
this.nums = new List<int>(nums);
}
public int Add(int val) {
nums.Add(val);
nums.Sort((a, b) => b.CompareTo(a));
return nums[k - 1];
}
}It works on paper. It is correct. And it is dead slow for the given constraints: sorting a list of n elements on every call means the total work is O(total*n * n _ log n), which for 10^4 adds with 10^4 initial elements becomes roughly 10^8 * log(10^4), far beyond acceptable.
The deeper problem is that sorting discards and recomputes information. After the first sort, you already know the order; the next call adds one element, and you re-sort from scratch. You are wasting the work you already did.
Time: O(n log n) per add, O(1) for construction — but you would normally pre-sort in the constructor too if you wanted to optimize the first few calls, which makes the constructor O(n log n) as well.
Space: O(n) to store all elements, growing without bound as add is called.
Approach 2: size-k min-heap — the right boundary
The insight is that you do not need the full sorted order. You need exactly one number: the kth largest. And the kth largest is the smallest element among the k largest elements.
A min-heap of size exactly k stores the k largest elements you have seen. Its root — the minimum of those k elements — is by definition the kth largest overall. Every time a new element arrives, one of two things happens: it is smaller than the current kth largest (it cannot displace any of the top k, so discard it) or it is larger or equal (it enters the top k, and the previous smallest of the top k gets evicted). The heap does both steps in O(log k).
import heapq
from typing import List
class KthLargest:
def __init__(self, k: int, nums: List[int]):
self.k = k
self.heap = list(nums)
heapq.heapify(self.heap) # O(n) linear heapify
while len(self.heap) > k:
heapq.heappop(self.heap) # evict smallest until size == k
def add(self, val: int) -> int:
heapq.heappush(self.heap, val)
if len(self.heap) > self.k:
heapq.heappop(self.heap)
return self.heap[0] # root is kth largestusing System.Collections.Generic;
public class KthLargest {
private int k;
private PriorityQueue<int, int> minHeap;
public KthLargest(int k, int[] nums) {
this.k = k;
this.minHeap = new PriorityQueue<int, int>();
foreach (int num in nums) {
minHeap.Enqueue(num, num); // priority == value -> min-heap behavior
}
while (minHeap.Count > k) {
minHeap.Dequeue();
}
}
public int Add(int val) {
minHeap.Enqueue(val, val);
if (minHeap.Count > k) {
minHeap.Dequeue();
}
return minHeap.Peek();
}
}Time: Constructor is O(n) for heapify plus O((n - k) log n) for the eviction loop, giving O(n log n) total in the worst case. Each add is O(log k) — push and at most one pop on a heap of size k.
Space: O(k). The heap never grows beyond k + 1 entries (you push then immediately pop if over).
Hand trace: k = 3, nums = [4, 5, 8, 2]
Walk through the constructor and then each add call from Example 1.
Step by step for the first two add calls:
Initial state after constructor: heap = [4, 5, 8] (internally a min-heap; root is 4). The three largest of {4, 5, 8, 2} are 8, 5, 4. The kth (3rd) largest is 4, sitting at the root.
add(3): push 3 into heap -> [3, 4, 5, 8], size 4 > k=3, pop the minimum (3) -> heap returns to [4, 5, 8]. Root = 4. 3 is not large enough to enter the top 3, so it was evicted immediately. Return 4. Correct.
add(5): push 5 -> [4, 5, 5, 8], pop minimum (4) -> [5, 5, 8]. Root = 5. The incoming 5 displaced the old 3rd largest (4). Return 5. Correct.
add(10): push 10 -> [5, 5, 8, 10], pop minimum (5) -> [5, 8, 10]. Root = 5. Return 5. The 10 joined the top 3 (10, 8, 5); the kth largest is 5.
add(9): push 9 -> [5, 8, 9, 10], pop 5 -> [8, 9, 10]. Root = 8. Return 8.
add(4): push 4 -> [4, 8, 9, 10], pop 4 -> [8, 9, 10]. Root = 8. Return 8.
Output sequence: [4, 5, 5, 8, 8]. Matches.
Edge cases that deserve explicit reasoning
Empty nums at construction: heapify([]) produces an empty heap. The while-loop condition len(heap) > k is immediately false. The first few add calls build the heap from scratch. The guarantee 1 <= k <= nums.length + 1 ensures that after enough adds the heap will reach size k before the problem asks for anything.
k = 1: The heap stores exactly one element: the running maximum. Every add that is larger than the current maximum evicts it; smaller values are immediately discarded. The heap root is always the maximum.
k equals the total count: When k equals the number of elements you end up holding, the heap stores all of them, and the root is the global minimum. This degrades to O(n log n) per add in space terms, but the time per add is still O(log k) = O(log n).
Duplicates: [7, 7, 7, 7, 8, 3] with k = 4 from Example 2. After construction the heap holds the 4 largest: [7, 7, 7, 8], root = 7. Adding 2: 2 < 7, evicted immediately, return 7. Adding 10: heap becomes [7, 7, 7, 8, 10], pop 7 -> [7, 7, 8, 10], return 7. This shows duplicates count as separate entries — the kth positional largest is 7 even when there are four 7s.
Negative values: -10^4 to 10^4. The heap comparisons are purely numeric; negatives work identically to positives. No special handling needed.
The invariant that makes this work
The heap maintains one invariant: it contains exactly the k largest elements seen so far. That invariant is established during construction (evict until size = k) and preserved on every add (push then evict if over). Given that invariant, the root is always the minimum of the k largest elements, which is exactly the kth largest by definition.
The key realization is that you are not tracking a sorted list — you are tracking a boundary. The boundary is: "is this element in the top k or not?" The heap root is that boundary value. Anything at or above the root is in the top k; anything below is not. Every add either updates the boundary (new element enters, old boundary exits) or leaves it unchanged (new element below the boundary, discarded).
Comparison table
| Approach | Constructor | Add | Space | Viable at 10^4 calls? |
|---|---|---|---|---|
| Brute force (sort on add) | O(n) | O(n log n) | O(n) | Borderline TLE |
| Min-heap of size k | O(n log n) | O(log k) | O(k) | Yes |
What I would ship, and why
Min-heap. Not even close. The brute force is correct and I'd accept it in a coding interview as a first pass, but I would never leave it there when add is called 10^4 times. The heap reduces per-call work from O(n log n) to O(log k), and uses O(k) space instead of O(n). If k is small relative to n — which is the typical use case — both differences are significant.
The one scenario where the brute force arguably wins is if you have very large k (close to n) and the data stream is short. In that case the heap does not save much memory, and the simpler code might be easier to maintain. But the problem constraints make k small in practice, so the heap is the right call.
In production you would probably reach for a language-native priority queue (Python's heapq, C#'s PriorityQueue, Java's PriorityQueue). All of them implement a binary min-heap under the hood. The pattern — push then conditionally pop to maintain size k — is the same in every language.
Where this sits in the heap series
This is problem five in the heap and priority queue series, after Last Stone Weight (1046), K Closest Points to Origin (973), Kth Largest Element in an Array (215), and Task Scheduler (621). Each of those introduced a different angle on the heap pattern.
Last Stone Weight used a max-heap to repeatedly extract the two largest. K Closest Points used a max-heap of size k to track the closest points by distance. Kth Largest Element in an Array (215) found the kth largest in a static array — the same min-heap-of-size-k trick applies there, making 703 a natural follow-on where the array becomes a live stream. Task Scheduler introduced the idea of using a heap to schedule work optimally.
703 adds the streaming dimension: the data arrives incrementally, and you need the answer after every single insertion. The heap's O(log k) per-operation cost is what makes streaming tractable.
215. Kth Largest Element in an Array is the static version of this problem. The stream is replaced by a fixed array; you can solve it with the same size-k min-heap or with quickselect. The heap approach transfers directly.
295. Find Median from Data Stream escalates the difficulty: instead of the kth largest, maintain the median. The solution uses two heaps — a max-heap for the lower half and a min-heap for the upper half — and rebalances after each insert.
347. Top K Frequent Elements shifts the criterion from value to frequency. You track k elements not by their raw value but by how often they appear. A heap of size k still applies, but the comparison key changes.
378. Kth Smallest Element in a Sorted Matrix applies the same "find kth element" question to a 2D structure where rows and columns are sorted. A min-heap drives the traversal.
Part of the series: Heap / Priority Queue
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.
