Find the Duplicate Number: when the array is a hidden linked list

by Duong Tran Quang aka DTDucasJune 14, 2026 · 13 min read#000681 reads#binary-search#two-pointers#cycle-detection#leetcode#linked-list

No ratings yetSign in to rate

An array of n + 1 integers where every value is in [1, n]. By the pigeonhole principle, at least one value must repeat. Find it. Then read the fine print: you cannot modify the array, and you get only constant extra space.

That second constraint is the entire puzzle. Without it, you sort in-place and scan in O(n) — done. Or you negate visited indices. Or you XOR. All of those require either modifying the array or O(n) auxiliary memory. Strip those away and you are left with two genuinely interesting approaches: binary search on the value domain, and cycle detection.

The problem

Given an array nums of n + 1 integers where every value falls in [1, n], find the one repeated number — without modifying the array and using only constant extra space. Full statement: LeetCode 287.

text

Constraints:
- 1 <= n <= 10^5
- nums.length == n + 1
- 1 <= nums[i] <= n
- All integers appear only once except for precisely one integer which appears two or more times.

What the constraints actually force

n goes up to 10^5. That alone rules out O(n^2) brute force — even a double loop is 10^10 operations in the worst case. You need at most O(n log n).

Every value is in [1, n], which is a tight, known range. That is the lever for binary search on values, not indices. Most binary search problems give you a sorted array and ask you to find a target; here you binary search on the answer space itself.

The array length is n + 1 and all values are in [1, n]. This is precisely the structure Floyd's algorithm needs: if you treat each index as a node and nums[i] as the "next pointer," you have a functional graph where every node has exactly one outgoing edge. The duplicate value is the node two different edges point to — which is exactly a cycle entry point.

No array modification means index-negation tricks are off the table. Constant space means hash sets and sorting into auxiliary structures are off the table. You get variables and pointer arithmetic. That is it.

Approach 1: Hash set — simple, but violates the space constraint

Worth stating clearly because every reasonable person writes this first.

python

class Solution:
    def findDuplicate(self, nums: list[int]) -> int:
        seen = set()
        for num in nums:
            if num in seen:
                return num
            seen.add(num)
        return -1  # unreachable per problem guarantee

csharp

public class Solution {
    public int FindDuplicate(int[] nums) {
        var seen = new HashSet<int>();
        foreach (int num in nums) {
            if (seen.Contains(num)) return num;
            seen.Add(num);
        }
        return -1; // unreachable
    }
}

Time: O(n). Space: O(n) — which is the constraint being violated. The hash set grows proportional to the number of distinct values seen, up to n entries.

This is the solution you'd write in a real codebase if the follow-up constraints were absent. It is correct. It is fast. In an interview, implementing this first and then explaining why you need to go further is a perfectly sensible path.

Approach 2: Binary search on the value domain — O(n log n), O(1) space

The key insight is a counting argument. For any value m in [1, n], count how many elements in nums are <= m. Call that count f(m).

If there were no duplicates, the values 1 through m would each appear exactly once, so f(m) = m. The duplicate shifts this: the repeated value contributes an extra count somewhere. Specifically, if the duplicate is d, then for all m >= d, f(m) is at least m + 1 — there is one "extra" value contributing to counts at or above d.

This monotonicity is exactly what binary search needs. You binary search on m in [1, n]:

If f(mid) > mid: the duplicate is in [left, mid], so right = mid.
If f(mid) <= mid: the duplicate is in [mid+1, right], so left = mid + 1.

When left == right, that is the duplicate.

python

class Solution:
    def findDuplicate(self, nums: list[int]) -> int:
        left, right = 1, len(nums) - 1  # value domain [1, n]
 
        while left < right:
            mid = (left + right) // 2
 
            count = sum(1 for num in nums if num <= mid)
 
            if count > mid:
                right = mid
            else:
                left = mid + 1
 
        return left

csharp

public class Solution {
    public int FindDuplicate(int[] nums) {
        int left = 1, right = nums.Length - 1;
 
        while (left < right) {
            int mid = left + (right - left) / 2;
 
            int count = 0;
            foreach (int num in nums) {
                if (num <= mid) count++;
            }
 
            if (count > mid) {
                right = mid;
            } else {
                left = mid + 1;
            }
        }
 
        return left;
    }
}

Time: O(n log n) — log n binary search steps, each requiring a full O(n) scan. Space: O(1).

Tracing through `nums = [1,3,4,2,2]`, n = 4

left=1, right=4

Iteration 1: mid=2
  Count of nums <= 2: [1,2,2] -> count=3
  3 > 2, so right=2

Iteration 2: left=1, right=2, mid=1
  Count of nums <= 1: [1] -> count=1
  1 <= 1, so left=2

left==right==2 -> return 2

The reasoning in slow motion: at mid=2, we found three values that are at most 2. But in a perfect set {1,2}, there would be exactly two. The extra count means the duplicate is somewhere in [1,2]. At mid=1, there is exactly one value at most 1, which is correct for {1}. So the duplicate is not 1 — it must be 2.

Approach 3: Floyd's cycle detection — O(n), O(1) space

This is the approach that makes you pause and think. It turns the array into an implicit linked list where node i points to nums[i].

For nums = [1,3,4,2,2]:

Start at index 0 (always). nums[0] = 1 so 0 -> 1.
nums[1] = 3 so 1 -> 3.
nums[3] = 2 so 3 -> 2.
nums[2] = 4 so 2 -> 4.
nums[4] = 2 so 4 -> 2. Cycle.

The value 2 appears twice (at indices 2 and 4). Two edges point to node 2. That is where the cycle begins. The entry point of the cycle is exactly the duplicate value.

Floyd's algorithm finds that entry point in two phases.

Phase 1: move slow one step and fast two steps per iteration. They will meet inside the cycle.

Phase 2: reset one pointer to the start (index 0). Advance both pointers one step at a time. They meet at the cycle entry — the duplicate value.

python

class Solution:
    def findDuplicate(self, nums: list[int]) -> int:
        # Phase 1: find meeting point inside the cycle
        slow = nums[0]
        fast = nums[0]
 
        while True:
            slow = nums[slow]
            fast = nums[nums[fast]]
            if slow == fast:
                break
 
        # Phase 2: find cycle entry
        slow = nums[0]
        while slow != fast:
            slow = nums[slow]
            fast = nums[fast]
 
        return slow

csharp

public class Solution {
    public int FindDuplicate(int[] nums) {
        // Phase 1: find meeting point inside the cycle
        int slow = nums[0];
        int fast = nums[0];
 
        do {
            slow = nums[slow];
            fast = nums[nums[fast]];
        } while (slow != fast);
 
        // Phase 2: find cycle entry
        slow = nums[0];
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[fast];
        }
 
        return slow;
    }
}

Time: O(n). Space: O(1).

Hand trace through `nums = [1,3,4,2,2]`

Phase 1 — both start at nums[0] = 1:

Step 1: slow=nums[1]=3, fast=nums[nums[1]]=nums[3]=2
Step 2: slow=nums[3]=2, fast=nums[nums[2]]=nums[4]=2
slow==fast==2 -> met at 2

Phase 2 — reset slow to nums[0] = 1, fast stays at 2:

Step 1: slow=nums[1]=3, fast=nums[2]=4
Step 2: slow=nums[3]=2, fast=nums[4]=2
slow==fast==2 -> cycle entry is 2

Return 2. Correct.

The mathematical guarantee: if the cycle has length c and the tail (before the cycle starts) has length F, the meeting point in phase 1 is c - (F mod c) steps inside the cycle. When you reset one pointer and walk both at speed 1, they meet after exactly F more steps — which is exactly the cycle entry. The duplicate value is where the cycle begins.

Why phase 2 works: the distance argument

Let the distance from nums[0] to the cycle entry be F. Let the distance from the cycle entry to the meeting point (going forward in the cycle) be a. Let the remaining cycle length be b, so cycle length c = a + b.

When phase 1 ends, slow has traveled F + a steps. Fast has traveled F + a + nc steps for some integer n (it lapped the cycle). Since fast moves twice as fast: 2(F + a) = F + a + nc, so F + a = nc, giving F = nc - a = (n-1)c + b.

Now in phase 2: one pointer starts at the head (0 steps in), the other is at the meeting point (a steps past the cycle entry). After F more steps, the head pointer is at the cycle entry. The other pointer has advanced F = (n-1)c + b steps from the meeting point, which wraps (n-1) full cycles and then b more steps from the meeting point — bringing it exactly to the cycle entry. They meet at the cycle entry. That is the duplicate.

Edge cases that deserve a line each

[1,1] (smallest possible input, n=1): the cycle is trivial — index 0 points to 1, index 1 points to 1, you stay at 1 forever. Floyd's phase 1 finds the meeting immediately; phase 2 confirms 1 as the entry. Binary search on [1,1] returns 1 in zero iterations.

[3,3,3,3,3] (all same value): every index maps to 3, so the "linked list" is 0->3->3->3... with an immediate trivial cycle at 3. Floyd's works. Binary search at mid=2: count of values <= 2 is 0, so left=3. At mid=3: count is 5, which is > 3, so right=3. Return 3.

Duplicate appears many times (not just twice): the problem says "one or more" duplicates of the same value. Floyd's is unaffected — the cycle still forms and the entry is still the duplicate. Binary search is also unaffected — the counting argument is f(m) > m whenever the accumulated excess from duplicate appearances exceeds the midpoint.

Value equal to n (maximum boundary): if the duplicate is n itself, binary search eventually converges with left=right=n. Floyd's: some indices point to n, creating the cycle ending at n. Both handle it without special-casing.

Comparison across the three approaches

Approach	Time	Space	Constraint-compliant
Hash set	O(n)	O(n)	No — violates O(1) space
Binary search on values	O(n log n)	O(1)	Yes
Floyd's cycle detection	O(n)	O(1)	Yes

The hash set is the obvious starting point and is what you would ship if the constant-space constraint did not exist. Binary search on values is the "elegant but slower" middle ground — it satisfies the constraints and the reasoning is transparent. Floyd's is optimal in both time and space, but the proof requires knowing a non-obvious fact: that the cycle entry equals the duplicate value.

Which one I'd write

In an interview where I need to demonstrate I understand the O(1) constraint: I start with the hash set, explain why it violates the constraint, and then present Floyd's as the real solution. I write binary search if the interviewer asks for an intermediate step or if the Floyd's proof feels too tricky under pressure.

In production code with this exact problem: neither the O(1) constraint nor the no-modification constraint exists, so I'd use a hash set and move on. Floyd's is beautiful but it introduces a cognitive cost — the next engineer reading it will spend five minutes reconstructing the cycle-entry proof.

The reason to know Floyd's algorithm is not this specific problem. It is the pattern: any mapping f: {0,...,n} -> {0,...,n} with a constrained codomain contains a cycle, and Floyd's finds the cycle entry in O(n) time and O(1) space. You'll see that pattern again in Linked List Cycle II, in certain hashing constructions, and in cryptographic attacks on small state machines.

Where this sits in the linked-list series and what comes next

This problem is the first in the series to connect an array to a linked list by treating values as pointers. Everything before — reversing, merging, reordering — took an explicit linked list structure. Here you construct the implicit structure yourself and then apply the same cycle-detection machinery.

Linked List Cycle II (142) is the direct parent of this problem's core algorithm. That problem gives you an explicit linked list and asks for the cycle entry. The Floyd's implementation here is identical; the difference is just how the "nodes" and "next pointers" are represented (array indices versus actual node references).

Missing Number (268) is the complement: instead of finding a value that appears too many times, find a value in [0, n] that does not appear at all. Math trick (Gauss sum) handles it in O(n) time, O(1) space without any cycle detection.

First Missing Positive (41) escalates the challenge: find the smallest missing positive in an unsorted array, again with O(n) time and O(1) space. The solution re-encodes presence information into the array by sign-flipping — which is the in-place modification trick that problem 287 explicitly disallows.

Find All Numbers Disappeared in an Array (448) generalizes in the other direction: multiple values may be missing. It uses the same index-as-pointer framework but allows array modification, which is why it can find all missing values in a single pass without Floyd's.

The pattern unifying 287, 142, and 41 is: when integer values must live in a bounded range, that range can serve as an implicit address space. You can build data structures out of the array itself — either as a linked list (Floyd's) or as a presence bitmap (sign-flipping). The constraint that forbids the obvious approach (hashing, sorting) is pointing you directly at the implicit structure.

Part of the series: Linked List

NEWSLETTER

Occasional notes on what I'm building

Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.

Comments

No comments yet. Be the first.

Find the Duplicate Number: when the array is a hidden linked list

The problem

What the constraints actually force

Approach 1: Hash set — simple, but violates the space constraint

Approach 2: Binary search on the value domain — O(n log n), O(1) space

Tracing through nums = [1,3,4,2,2], n = 4

Approach 3: Floyd's cycle detection — O(n), O(1) space

Hand trace through nums = [1,3,4,2,2]

Why phase 2 works: the distance argument

Edge cases that deserve a line each

Comparison across the three approaches

Which one I'd write

Where this sits in the linked-list series and what comes next

Occasional notes on what I'm building

Comments

Let's build something reliable, end to end

Tracing through `nums = [1,3,4,2,2]`, n = 4

Hand trace through `nums = [1,3,4,2,2]`