Combination Sum II: when duplicates in the input become duplicates in the output

by Duong Tran Quang aka DTDucasJune 14, 2026 · 13 min read#000830 reads#recursion#array#sorting#backtracking#leetcode

Combination Sum (LeetCode 39) let you reuse any element freely — the only problem was preventing permutation-order duplicates, solved by enforcing a forward-only index. Combination Sum II strips the reuse and hands you a dirtier input: the candidates array can contain repeated values, each element is usable exactly once, and the result must still be a set of unique combinations. That combination of constraints — no reuse, potentially repeated inputs — makes duplicate avoidance harder in a specific way.

The core difficulty is this: if candidates contains [1, 1, 2] and target is 3, both the first 1 and the second 1 could anchor a valid [1, 2]. Without any guard, the backtracking tree will find [1, 2] twice — once starting from index 0 and once from index 1. The values are different positions in the array, but the combinations are semantically identical. The problem asks you to deduplicate them.

The problem

Given a collection of candidate numbers (candidates) and a target number (target), find all unique combinations in candidates where the candidate numbers sum to target. Each number may only be used once, and the solution set must not contain duplicate combinations. Full statement: LeetCode 40.

text

Constraints:
- 1 <= candidates.length <= 100
- 1 <= candidates[i] <= 50
- 1 <= target <= 30

What the constraints are actually telling you

1 <= candidates.length <= 100. A hundred elements. That is larger than the 30 in LeetCode 39, but backtracking still fits — the saved-state recursion never materializes the full 2^100 tree because of pruning, and target <= 30 is the binding limit on depth. The recursion can go at most 30 levels deep (picking 30 ones), and with at most 100 branches per level (before pruning), this is well within budget.

1 <= candidates[i] <= 50. All positive. This is crucial: there are no zeros or negatives to worry about, which means every element you add to the current combination moves the remaining target strictly downward. The remaining < 0 branch becomes unreachable if you prune with candidates[i] > remaining — once that check fires, you can break immediately because the sorted array guarantees everything beyond is also too large.

1 <= target <= 30. Target tops out at 30, and every element is at least 1, so the combination depth is at most 30. That bounds the call stack even in the worst case.

The presence of duplicates in candidates is not stated in the constraints table but is established by example: [10,1,2,7,6,1,5] contains two 1s. The constraint candidates[i] >= 1 means we cannot use value uniqueness as a proxy for position uniqueness. This forces explicit duplicate handling at the choice level, not just at the result level.

Approach 1: backtracking with a hash set to deduplicate results

The straightforward idea: run backtracking freely, then deduplicate on the way out. At each step, include or exclude the current element. When remaining == 0, sort the combination and insert it into a set. The sorting step normalizes order so that [1, 2] found via index 0 and [1, 2] found via index 1 collide in the set and reduce to one entry.

python

class Solution:
    def combinationSum2(self, candidates: list[int], target: int) -> list[list[int]]:
        result_set: set[tuple[int, ...]] = set()
 
        def backtrack(index: int, current: list[int], remaining: int) -> None:
            if remaining == 0:
                result_set.add(tuple(sorted(current)))
                return
            if remaining < 0 or index >= len(candidates):
                return
            # exclude current element
            backtrack(index + 1, current, remaining)
            # include current element
            current.append(candidates[index])
            backtrack(index + 1, current, remaining - candidates[index])
            current.pop()
 
        backtrack(0, [], target)
        return [list(combo) for combo in result_set]

csharp

using System.Collections.Generic;
using System.Linq;
 
public class Solution {
    public IList<IList<int>> CombinationSum2(int[] candidates, int target) {
        var resultSet = new HashSet<string>();
        var finalResult = new List<IList<int>>();
 
        Backtrack(0, new List<int>(), target, candidates, resultSet);
 
        foreach (var key in resultSet) {
            finalResult.Add(key.Split(',').Select(int.Parse).ToList());
        }
        return finalResult;
    }
 
    private void Backtrack(int index, List<int> current, int remaining,
                           int[] candidates, HashSet<string> resultSet) {
        if (remaining == 0) {
            var sorted = current.OrderBy(x => x).ToList();
            resultSet.Add(string.Join(",", sorted));
            return;
        }
        if (remaining < 0 || index >= candidates.Length) return;
 
        // exclude
        Backtrack(index + 1, current, remaining, candidates, resultSet);
        // include
        current.Add(candidates[index]);
        Backtrack(index + 1, current, remaining - candidates[index], candidates, resultSet);
        current.RemoveAt(current.Count - 1);
    }
}

This works, but it is wasteful in two ways. First, it explores the entire 2^n tree — the full include/exclude decision tree — without pruning paths that are already over the target. Second, it sorts and hashes every valid combination it finds, paying O(k log k) per combination where k is the combination length. If the input has many duplicate values, many paths will converge to the same combination, all wasting work before colliding in the set.

Time: O(2^n * n log n) — 2^n nodes explored, with O(n log n) sort overhead per found combination.

Space: O(2^n * n) — the hash set can accumulate all distinct combinations, each stored as a string of up to n elements.

I would not ship this. It's the right first attempt to identify what needs to be fixed, but generating duplicates then deduplicating is strictly worse than not generating them in the first place.

Approach 2: sorted backtracking with same-level skip guard

The fix is to deduplicate during the search, not after it. Sort the candidates array first. Now all equal values are adjacent. When iterating over choices at a given recursion level, if you encounter a value that is the same as the value you just tried at this level, skip it — you already explored that branch.

The condition is i > start and candidates[i] == candidates[i-1]. The i > start part is the non-obvious half. It ensures you only skip a value if it is a repeat at the current level. At different levels, the same value may appear for different reasons — the first 1 in [1, 1, 6] came from one level, the second 1 from the level below. You cannot skip based on i > 0 alone, because that would block the second 1 even when it's being considered for the next position in the combination, not the same position.

python

class Solution:
    def combinationSum2(self, candidates: list[int], target: int) -> list[list[int]]:
        candidates.sort()
        result: list[list[int]] = []
 
        def backtrack(start: int, current: list[int], remaining: int) -> None:
            if remaining == 0:
                result.append(current[:])
                return
            for i in range(start, len(candidates)):
                # skip duplicate values at the same recursion level
                if i > start and candidates[i] == candidates[i - 1]:
                    continue
                # prune: sorted array means everything beyond is also too large
                if candidates[i] > remaining:
                    break
                current.append(candidates[i])
                backtrack(i + 1, current, remaining - candidates[i])
                current.pop()
 
        backtrack(0, [], target)
        return result

csharp

using System;
using System.Collections.Generic;
 
public class Solution {
    public IList<IList<int>> CombinationSum2(int[] candidates, int target) {
        Array.Sort(candidates);
        var result = new List<IList<int>>();
        Backtrack(0, new List<int>(), target, candidates, result);
        return result;
    }
 
    private void Backtrack(int start, List<int> current, int remaining,
                           int[] candidates, IList<IList<int>> result) {
        if (remaining == 0) {
            result.Add(new List<int>(current));
            return;
        }
        for (int i = start; i < candidates.Length; i++) {
            // skip duplicate values at the same recursion level
            if (i > start && candidates[i] == candidates[i - 1]) continue;
            // prune
            if (candidates[i] > remaining) break;
 
            current.Add(candidates[i]);
            Backtrack(i + 1, current, remaining - candidates[i], candidates, result);
            current.RemoveAt(current.Count - 1);
        }
    }
}

Note i + 1 in the recursive call, not i. That single difference from LeetCode 39 is what enforces "each element used at most once." In LeetCode 39 you passed i to allow reuse. Here you pass i + 1 to consume the current position and move on.

Time: O(2^n) in the worst case (no pruning fires, no duplicates to skip), but the skip guard eliminates entire branches when the input has repeated values. The practical running time on duplicate-heavy inputs is substantially better than the brute-force version.

Space: O(target) for the recursion stack — at most target levels deep since every element is at least 1. The output cost is O(k * S) where k is the number of valid combinations and S is average combination length, but that's unavoidable.

Tracing the key example by hand

candidates = [10, 1, 2, 7, 6, 1, 5], target = 8.

After sorting: [1, 1, 2, 5, 6, 7, 10]. Indices: 0=1, 1=1, 2=2, 3=5, 4=6, 5=7, 6=10.

The four valid combinations found: [1, 1, 6], [1, 2, 5], [1, 7], [2, 6]. These match the expected output exactly.

Notice how at the root level, index 1 (value 1) is skipped because candidates[1] == candidates[0] and i > start (1 > 0). That's the skip guard in action: we already explored the branch starting at the first 1; starting at the second 1 would generate the same subtree. The whole subtree rooted at [1] (second one) is never visited.

Inside the [1] subtree (starting from index 0), the second 1 at index 1 is not skipped — because now i = 1 and start = 1, so i > start is false. The second 1 is a legitimate choice for the second position of the combination, which is how [1, 1, 6] gets found.

That asymmetry — same value, different treatment depending on whether i > start — is the entire trick.

Edge cases that trip people up

No valid combination (candidates = [5, 6, 7], target = 3): after sorting, candidates[0] = 5 > 3, so the loop breaks immediately on the first call. Result is empty with no special handling needed.

All elements identical (candidates = [2, 2, 2, 2], target = 4): after sorting, [2, 2, 2, 2]. At the root level, only index 0 is tried (index 1, 2, 3 are all skipped by the duplicate guard). From [2], the loop starts at index 1 — candidates[1] = 2, and 1 > 1 is false, so it's not skipped. [2, 2] is found; from there, [2, 2, 2] is found; then remaining = 4 - 2 - 2 = 0 no wait, target is 4 so remaining after picking two 2s is 0 — the answer is [[2, 2]]. Correct.

Single element equals target (candidates = [8], target = 8): immediately records [8]. One call to backtrack, one leaf. Clean.

Duplicates but only one usable (candidates = [1, 1], target = 1): at the root, pick index 0 (value 1), remaining becomes 0, record [1]. Back at root, index 1 is skipped (duplicate at same level). Result: [[1]]. The second 1 is correctly excluded from being a separate path to the same output.

Target smaller than smallest element (candidates = [3, 4, 5], target = 2): first element already exceeds remaining, break fires immediately. Empty result.

Forgetting to copy the list — a non-constraint edge case but a common bug: result.append(current) stores a reference to the live list. After backtracking completes, that list is empty. Always current[:] in Python or new List<int>(current) in C#.

Which approach ships

Approach 2, every time. It is not even close.

The hash-set approach (Approach 1) generates duplicate paths and then pays to sort and hash them away. The computation is wasted from the moment the duplicate path starts. Approach 2 eliminates those paths before they exist — the skip guard fires at the loop level, so the duplicated subtree is never entered. The result is the same; the work is not.

There is a temptation to reach for Approach 1 because the logic is easier to reason about: "just find everything and deduplicate." That reasoning works in problems where duplicates in the output are rare. Here they are structural — every repeated value in the input causes duplicate paths — and the cost compounds with the size of the subtree being duplicated. The skip guard is not an optimization; it is how the problem is correctly solved.

The one case where I'd reconsider: if you are in a competition and have 5 minutes left. The hash-set version is faster to type and clearly correct. In a production review, the sorted version is what I'd merge.

Comparison table

Approach	Time	Space (stack)	Duplicates generated	Notes
Brute force + set	O(`2^n * n log n`)	O(`2^n * n`)	Yes, then removed	Pays to sort and hash every found combination
Sorted + skip guard	O(`2^n`) worst case	O(target)	No	Skip fires at loop level; pruned array enables early break

Where this fits in the series

Combination Sum (LeetCode 39) established the core backtracking skeleton for combination problems: sort, start index, remaining, break on overshoot. Combination Sum II is that skeleton with two changes: i + 1 instead of i in the recursive call (no reuse), and the skip guard (handle duplicate inputs). Together those two changes are the entire diff. The rest of the article is explaining why they work.

A few natural follow-ons from here:

Combination Sum III (LeetCode 216) adds a second constraint alongside the sum: you must use exactly k numbers, drawn from 1 to 9, each at most once. The backtracking structure is the same — start index, pruning, no reuse — but you track both count and sum in the base case. There are no duplicates in the input (1–9 are distinct), so the skip guard is not needed.

Subsets II (LeetCode 90) is the direct subset version of this problem. Instead of targeting a specific sum, find all unique subsets of an array that may contain duplicates. The skip guard is identical: i > start and candidates[i] == candidates[i-1]. Same pattern, no sum constraint.

Letter Combinations of a Phone Number (LeetCode 17) looks structurally different but uses the same loop-over-choices-then-recurse skeleton. Instead of a skip guard for duplicates, the "choices" at each level are defined by a phone keypad mapping. No deduplication needed; the structure handles it by construction.

Palindrome Partitioning (LeetCode 131) takes backtracking into string space. At each position you decide where to cut, collect the prefix if it's a palindrome, and recurse on the suffix. The tree shape is different, but the push-recurse-pop skeleton is identical. No skip guard needed because the partition points are positional, not value-based.

Part of the series: Backtracking

NEWSLETTER

Occasional notes on what I'm building

Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.

Comments

No comments yet. Be the first.