Longest Consecutive Sequence: the O(n) trick that skips redundant counting
Sorting the array would make this trivial: scan left to right, track the current run. That is O(n log n). The problem explicitly requires O(n). So sorting is out — not because it's hard to write but because the constraint says so. That single restriction is the whole puzzle.
What the constraints force
0 <= nums.length <= 10^5. The array can be empty. Up to 100,000 elements means an O(n log n) sort is borderline acceptable in practice, but the problem forbids it explicitly.
-10^9 <= nums[i] <= 10^9. This range is 2 billion wide. Bucket sort and direct-index tricks need an array of that size — not viable. The range also rules out any trick like "shift everything to start at 0". You need a structure that gives O(1) membership queries without caring about the actual values, which means hash table.
Duplicates are legal. The third example [1, 0, 1, 2] has two 1s and the answer is 3 (the sequence [0, 1, 2]). A hash set collapses duplicates automatically.
The O(n) constraint is not a nudge — it is a hard gate that eliminates everything except hash-based lookup. Once you see that, three approaches fall out naturally: the naive O(n²) version (list-based lookup), the sorting detour at O(n log n), and the hash set solution that actually satisfies the constraint.
Brute force: O(n²)
Before reaching for the optimal solution, consider the naive approach. For each number, check whether it is the start of a consecutive sequence by confirming its predecessor is absent, then count upward using linear search through the deduplicated values.
class Solution:
def longestConsecutive(self, nums: list[int]) -> int:
best = 0
num_list = list(set(nums)) # deduplicate first
for n in num_list:
if n - 1 not in num_list: # O(n) list membership
length = 1
while n + length in num_list: # O(n) each call
length += 1
best = max(best, length)
return bestpublic class Solution {
public int LongestConsecutive(int[] nums) {
var numList = nums.Distinct().ToList();
int best = 0;
foreach (int n in numList) {
if (!numList.Contains(n - 1)) { // O(n) list search
int length = 1;
while (numList.Contains(n + length)) // O(n) each call
length++;
best = Math.Max(best, length);
}
}
return best;
}
}The in operator on a list (and List<T>.Contains in C#) is O(n) — it scans from the beginning every time. The outer loop is O(n). The inner while can run O(n) steps. Total: O(n²). With n = 10^5, that is 10^10 operations — ten seconds or more on typical hardware. Too slow.
The fix is mechanical: swap the list for a hash set. Nothing else changes.
Sorting: O(n log n)
Sort, deduplicate, then scan for consecutive runs. After sorting, any consecutive sequence occupies a contiguous block in the array, so a single left-to-right pass finds all of them.
class Solution:
def longestConsecutive(self, nums: list[int]) -> int:
if not nums:
return 0
nums = sorted(set(nums))
best = 1
current = 1
for i in range(1, len(nums)):
if nums[i] == nums[i - 1] + 1:
current += 1
best = max(best, current)
else:
current = 1
return bestpublic class Solution {
public int LongestConsecutive(int[] nums) {
if (nums.Length == 0) return 0;
int[] sorted = nums.Distinct().OrderBy(x => x).ToArray();
int best = 1, current = 1;
for (int i = 1; i < sorted.Length; i++) {
if (sorted[i] == sorted[i - 1] + 1) {
current++;
best = Math.Max(best, current);
} else {
current = 1;
}
}
return best;
}
}O(n log n) from the sort. Distinct / set() removes duplicates — two identical values in the input would create adjacent equal entries in the sorted array, which the nums[i] == nums[i-1] + 1 check would correctly not count as consecutive. Collapsing them first is cleaner. The logic after sorting is simple and readable.
But O(n log n) violates the constraint. Sorting is out — not because it is impractical, but because the problem says so.
The optimal move: hash set with sequence-start detection
The key insight is that you only need to count from sequence starts — numbers that have no left neighbour in the input. Every other number will eventually be swept up as part of a run that began to its left. This means the outer loop can skip most elements, and the inner while loop only runs for true starts.
class Solution:
def longestConsecutive(self, nums: list[int]) -> int:
num_set = set(nums)
best = 0
for n in num_set:
if n - 1 not in num_set: # n is a sequence start
length = 1
while n + length in num_set:
length += 1
best = max(best, length)
return bestpublic class Solution {
public int LongestConsecutive(int[] nums) {
var numSet = new HashSet<int>(nums);
int best = 0;
foreach (int n in numSet) {
if (!numSet.Contains(n - 1)) { // n is a sequence start
int length = 1;
while (numSet.Contains(n + length))
length++;
best = Math.Max(best, length);
}
}
return best;
}
}Why the nested loop is still O(n)
There is a nested structure here: an outer for and an inner while. The natural worry is O(n²). It is not.
Each element in the set is touched by the inner while at most once, across the entire execution. Consider any element k. Either k - 1 is present in the set — in which case k fails the sequence-start check and the outer loop skips it — or k - 1 is absent, meaning k is a start and the inner while counts forward from k. The inner while reaches k during the forward scan only if some smaller start triggered it. No element is counted in the inner while and also serves as a start in the outer loop. These two roles are mutually exclusive.
Across all iterations combined, the inner while steps total at most n — one step per element in the set. The total work is O(n) for building the set plus O(n) for the two-role traversal. The amortized argument is the same one that justifies union-find path compression.
By-hand trace: [100, 4, 200, 1, 3, 2]
num_set = {1, 2, 3, 4, 100, 200} after construction.
The order in which the set iterates is not guaranteed — Python sets have their own internal ordering — but the result is identical regardless. Elements 2, 3, 4 are skipped in the outer loop and instead get counted when n = 1 runs the inner while. Each element is processed exactly once.
Second example: nums = [0, 3, 7, 2, 5, 8, 4, 6, 0, 1]. After deduplication, num_set = {0, 1, 2, 3, 4, 5, 6, 7, 8}. The only sequence start is 0 — every other number has a left neighbour. The inner while runs from 0 through 8, nine steps, giving length 9.
Edge cases
Empty array. best initializes to 0. The for loop body never executes. The function returns 0 correctly. An explicit early-return guard if not nums: return 0 works too but is redundant — the code handles it without it.
Single element. [5] — 4 is not in the set, so 5 is a sequence start. The while body never executes (6 is absent). Return 1.
All duplicates. [1, 1, 1] — after set construction, only 1 remains. 0 is not in the set, so 1 is a sequence start. Length is 1. Return 1. The deduplication in the set constructor is doing the work here; you never need to handle duplicates explicitly.
No consecutive pairs anywhere. [1, 3, 5, 7] — every element is a sequence start, every inner while exits immediately. Each length is 1. Return 1. Four starts, four single-element sequences.
Negative numbers. [-1, 0, 1] — -1 has no predecessor -2 in the set, so it is the start. The while counts -1 -> 0 -> 1. Length 3. Return 3. Negative numbers are not special; the arithmetic n - 1 and n + length works identically.
Complexity comparison
| Approach | Time | Space | Meets O(n) constraint |
|---|---|---|---|
| Brute force (list) | O(n²) | O(n) | No |
| Sorting | O(n log n) | O(n) | No |
| Hash set | O(n) | O(n) | Yes |
All three use O(n) space — either for the deduplicated sorted array or the hash set. The only meaningful difference is time.
The mental model this problem is teaching
The sorting approach asks: arrange elements, then scan for runs. The hash set approach asks a different question: which elements are sequence starts? These are different framings of the same data.
"Sequence start = element with no left neighbour in the set" is the anchor. Once you have that, detecting starts costs O(1) per element (one hash lookup), and counting from a start costs O(length of that sequence). Summed across all starts, the total counting work equals the number of elements in the set — O(n).
This pattern generalises. When you need to detect adjacency or neighbourhood in unsorted data without sorting, a hash set lets you query "does my neighbour exist?" in O(1). The sequence-start trick is one instance of that general move. Two Sum uses the same mechanism in a different direction — checking whether target - n is in the set rather than n - 1.
I'd ship the hash set solution in production code. The sorting approach is more readable for anyone unfamiliar with the pattern, but the O(n log n) vs O(n) difference matters for large inputs, and the hash set version is not meaningfully harder to follow once you understand sequence-start detection.
Related problems
- LeetCode 217 — Contains Duplicate: basic hash set membership; the simplest version of the "store values in a set, query existence" pattern
- LeetCode 268 — Missing Number: finding the gap in a consecutive range; the same set-membership idea but asking what is absent rather than what is present
- LeetCode 41 — First Missing Positive: first absent positive in an unsorted array; adds the twist that you want the minimum missing value, not the longest run
- LeetCode 594 — Longest Harmonious Subsequence: longest subsequence where max - min == 1; uses a hash map frequency count instead of a set, adding a count dimension to the adjacency query
Part of the series: Arrays & Hashing
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.