Valid Sudoku: three-group constraint checking and the canonical-key pattern

by Duong Tran Quang aka DTDucasJune 14, 2026 · 13 min read#000531 reads#constraint-validation#matrix#arrays-hashing#hash-set#leetcode

No ratings yetSign in to rate

Nine rows, nine columns, nine 3x3 boxes — and a single question asked 81 times: does this digit already appear somewhere it should not? The problem is not algorithmically hard. What makes it worth studying is how the constraint structure drives the implementation. The moment you read "each row, each column, each 3x3 box must not repeat," you already know the data structure: one tracking mechanism per group, totalling 27 groups. Everything else follows from that.

One note the problem statement makes easy to miss: you are not solving the puzzle. A partially-filled board is valid as long as the filled cells do not violate the rules — even if no complete solution exists. Only filled cells ('1' through '9') are checked; '.' is skipped.

The problem

Given a 9x9 Sudoku board, determine whether the board is valid by checking that each row, each column, and each of the nine 3x3 sub-boxes contains no repeated digit from 1 to 9 — only the already-filled cells are validated, not whether a complete solution exists. Full statement: LeetCode 36.

text

Constraints:
- board.length == 9
- board[i].length == 9
- board[i][j] is a digit 1-9 or '.'.

What the fixed 9x9 size actually means

The constraints are board.length == 9, board[i].length == 9, and board[i][j] is a digit '1'-'9' or '.'. The board is always exactly 81 cells. This is not a general n x n matrix problem.

The practical consequence: every O complexity for this problem is technically O(1), because the input size is bounded by a constant (81). When I write "O(81)" or "O(27)" below, that is the honest form. I still talk through the structure as if the board were n x n so the reasoning generalizes — but do not let the O(1) label fool you into thinking one approach has no real-world advantage over another. Constant factors matter at fixed sizes.

The 9-digit alphabet (or 10 characters including .) means you can use a character set of size at most 9 per group. There is no overflow risk, no sentinel confusion. The only real constraint interaction is the box formula: given row i and column j, which of the nine 3x3 boxes owns that cell?

box_index = (i // 3) * 3 + (j // 3)

This maps the nine combinations of (i // 3, j // 3) — each being 0, 1, or 2 — to integers 0 through 8. Get this wrong (using i // 3 + j // 3 instead, for example) and boxes 0-8 collapse into indices 0-4, silently merging separate boxes. Draw the grid once and verify:

(0,0)->0  (0,1)->1  (0,2)->2
(1,0)->3  (1,1)->4  (1,2)->5
(2,0)->6  (2,1)->7  (2,2)->8

That formula is the one non-obvious piece. Everything else is iteration and set membership.

Approach 1: three separate passes

The most readable first cut mirrors the problem statement directly. Run three separate scans: once for all rows, once for all columns, once for all nine boxes. Each scan builds a fresh hash set, checks for duplicates, then discards the set before moving on.

python

class Solution:
    def isValidSudoku(self, board):
        # Check all rows
        for row in range(9):
            seen = set()
            for col in range(9):
                if board[row][col] != '.':
                    if board[row][col] in seen:
                        return False
                    seen.add(board[row][col])
 
        # Check all columns
        for col in range(9):
            seen = set()
            for row in range(9):
                if board[row][col] != '.':
                    if board[row][col] in seen:
                        return False
                    seen.add(board[row][col])
 
        # Check all nine 3x3 boxes
        for box_row in range(3):
            for box_col in range(3):
                seen = set()
                for row in range(box_row * 3, box_row * 3 + 3):
                    for col in range(box_col * 3, box_col * 3 + 3):
                        if board[row][col] != '.':
                            if board[row][col] in seen:
                                return False
                            seen.add(board[row][col])
 
        return True

csharp

public class Solution {
    public bool IsValidSudoku(char[][] board) {
        // Check all rows
        for (int row = 0; row < 9; row++) {
            var seen = new HashSet<char>();
            for (int col = 0; col < 9; col++) {
                if (board[row][col] != '.') {
                    if (seen.Contains(board[row][col])) return false;
                    seen.Add(board[row][col]);
                }
            }
        }
 
        // Check all columns
        for (int col = 0; col < 9; col++) {
            var seen = new HashSet<char>();
            for (int row = 0; row < 9; row++) {
                if (board[row][col] != '.') {
                    if (seen.Contains(board[row][col])) return false;
                    seen.Add(board[row][col]);
                }
            }
        }
 
        // Check all nine 3x3 boxes
        for (int boxRow = 0; boxRow < 3; boxRow++) {
            for (int boxCol = 0; boxCol < 3; boxCol++) {
                var seen = new HashSet<char>();
                for (int row = boxRow * 3; row < boxRow * 3 + 3; row++) {
                    for (int col = boxCol * 3; col < boxCol * 3 + 3; col++) {
                        if (board[row][col] != '.') {
                            if (seen.Contains(board[row][col])) return false;
                            seen.Add(board[row][col]);
                        }
                    }
                }
            }
        }
 
        return true;
    }
}

The logic in each of the three passes is identical: create a set, iterate the group, early-return on duplicate, otherwise add. The repetition is the downside. The board is traversed three times, and the structure that does the work is written out three times. It reads well — follow the rows check, then the columns check, then the boxes check — but it is longer than it needs to be.

Complexity. Time: O(81 * 3) = O(243) = O(1). Each of the three passes visits at most 81 cells. Space: O(9) per set, one set at a time, so O(1) peak. The set is discarded after each row/column/box, so you never hold more than 9 characters at once.

Approach 2: one pass with three arrays of sets

The key observation: every cell (i, j) belongs simultaneously to exactly one row, one column, and one box. You can check all three memberships at the same time, in a single traversal.

Pre-allocate 27 sets — rows[0..8], cols[0..8], boxes[0..8] — one per group. For each filled cell, look up the current digit in the appropriate row set, column set, and box set. If any lookup hits, return false. Otherwise, insert the digit into all three.

python

class Solution:
    def isValidSudoku(self, board):
        rows = [set() for _ in range(9)]
        cols = [set() for _ in range(9)]
        boxes = [set() for _ in range(9)]
 
        for i in range(9):
            for j in range(9):
                cell = board[i][j]
                if cell == '.':
                    continue
 
                box_index = (i // 3) * 3 + j // 3
 
                if cell in rows[i] or cell in cols[j] or cell in boxes[box_index]:
                    return False
 
                rows[i].add(cell)
                cols[j].add(cell)
                boxes[box_index].add(cell)
 
        return True

csharp

public class Solution {
    public bool IsValidSudoku(char[][] board) {
        var rows = new HashSet<char>[9];
        var cols = new HashSet<char>[9];
        var boxes = new HashSet<char>[9];
 
        for (int k = 0; k < 9; k++) {
            rows[k] = new HashSet<char>();
            cols[k] = new HashSet<char>();
            boxes[k] = new HashSet<char>();
        }
 
        for (int i = 0; i < 9; i++) {
            for (int j = 0; j < 9; j++) {
                char cell = board[i][j];
                if (cell == '.') continue;
 
                int boxIndex = (i / 3) * 3 + j / 3;
 
                if (rows[i].Contains(cell) ||
                    cols[j].Contains(cell) ||
                    boxes[boxIndex].Contains(cell)) {
                    return false;
                }
 
                rows[i].Add(cell);
                cols[j].Add(cell);
                boxes[boxIndex].Add(cell);
            }
        }
 
        return true;
    }
}

Complexity. Time: O(81) = O(1). One pass, three O(1) hash set operations per cell. Space: O(27 * 9) = O(243) = O(1). All 27 sets live simultaneously, each holding at most 9 characters. That is the tradeoff: the one-pass version uses more memory at peak (243 character slots vs. 9), but the constant is still laughably small.

Hand trace through a portion of Example 1

Let me walk the first five non-empty cells of row 0. The board begins:

Row 0: ['5','3','.','.','7','.','.','.','.']

Starting state: rows[0] = {}, cols[0..8] = {}, boxes[0..2] = {}.

Cell (0,0) = '5'
  box_index = (0//3)*3 + (0//3) = 0*3 + 0 = 0
  '5' in rows[0]? No. '5' in cols[0]? No. '5' in boxes[0]? No.
  -> rows[0] = {'5'}, cols[0] = {'5'}, boxes[0] = {'5'}

Cell (0,1) = '3'
  box_index = 0*3 + 0 = 0
  '3' in rows[0]? No. '3' in cols[1]? No. '3' in boxes[0]? No.
  -> rows[0] = {'5','3'}, cols[1] = {'3'}, boxes[0] = {'5','3'}

Cell (0,2) = '.'  -> skip

Cell (0,3) = '.'  -> skip

Cell (0,4) = '7'
  box_index = 0*3 + (4//3) = 0 + 1 = 1
  '7' in rows[0]? No. '7' in cols[4]? No. '7' in boxes[1]? No.
  -> rows[0] = {'5','3','7'}, cols[4] = {'7'}, boxes[1] = {'7'}

Now consider Example 2, where the top-left entry is changed to '8', and row 3 also begins with '8'. When we reach cell (3,0) = '8':

Cell (3,0) = '8'
  box_index = (3//3)*3 + (0//3) = 1*3 + 0 = 3
  '8' in rows[3]? No (first cell in row 3). '8' in cols[0]? Yes (from (0,0)='8').
  -> return False

The column check catches it immediately. The box check would also catch it (boxes[3] is the middle-left box, different from boxes[0] which already holds '8'), but the column hit fires first.

Comparison table

Approach	Traversals	Peak memory	Code lines
Three-pass brute force	3 passes, up to 243 cell visits	O(9) — one set at a time	~35
One-pass hash sets	1 pass, 81 cell visits	O(243) — all sets alive at once	~20

Both are O(1) time and space on this fixed-size input. In isolation, the constant difference between 243 and 81 cell visits is irrelevant. What matters is extensibility: if the board were n x n, the one-pass version scales to O(n^2) while the three-pass version also scales to O(n^2) — same asymptotic, but the one-pass version visits each cell exactly once, which is the theoretical minimum.

Edge cases that matter

All dots (empty board). Every cell hits if cell == '.' and skips. All sets remain empty. Returns true. The empty board is valid by the problem statement — there are no filled cells to check.

Single filled cell. One insert into one row set, one column set, one box set. No prior entry exists. Returns true. The single-cell case cannot violate any constraint.

Duplicate in the same row. Say '8' appears twice in row 3. The second occurrence finds '8' already in rows[3] and returns false. The column and box checks do not even run for that cell — short-circuit evaluation stops at the first hit.

Duplicate in the same box, different row and column. This is the case shown in Example 2: '8' at (0,0) and '8' at (3,0) are in different rows and different columns, but in this specific layout they happen to also differ in boxes (box 0 vs. box 3). The actual failing constraint in Example 2 is a column duplicate at column 0 — but the box check would catch it regardless if you put '8' at (0,0) and '8' at (1,1) instead, which share box 0 but not a row or column.

Duplicate in the same column, far apart rows. The duplicate appears when the column set for that column already contains the digit. Distance in rows is irrelevant — the column set has no row-ordering concept.

Box boundary cells. Cells at (2,2) and (3,3) are in different boxes (0 and 4 respectively), even though they are visually close. The formula (i//3)*3 + j//3 handles this correctly: (2//3)*3 + (2//3) = 0 and (3//3)*3 + (3//3) = 4. If you ever doubt your box formula, enumerate a few boundary cells manually.

The underlying pattern: canonical-key bucketing

What this problem is really practicing is a pattern I think of as canonical-key bucketing. You have a collection of items (cells), each of which belongs to multiple groups simultaneously. You want to detect any group that contains two identical items. The mechanism: for each possible group, maintain a set. For each item, look up its key in the sets of all its groups; on a hit, reject; otherwise insert.

The "key" here is just the digit character. The "groups" are rows, columns, boxes. But the exact same pattern appears in:

Contains Duplicate (217): one group (the whole array), key is the element value.
Group Anagrams (49): each item (word) is bucketed by its canonical key (sorted characters). Items sharing a key belong to the same group.
Longest Consecutive Sequence (128): the set is used for O(1) membership check rather than bucketing, but the principle of "does this element already exist in this conceptual group" is identical.

Recognizing canonical-key bucketing lets you solve Valid Sudoku the first time you see it. The moment the problem describes constraint groups (rows, columns, boxes), you know you need one tracking set per group, and the question reduces to "what is the group identifier formula?"

What I would ship, and why

I would write the one-pass version. Not because it runs measurably faster on a 9x9 board — it does not. The reason is code clarity under maintenance: the three-pass version repeats the same "build a set, scan a group, check for duplicates" logic three times with different iteration patterns. Any future change (different board size, additional constraint, different data source) requires updating three blocks in parallel. The one-pass version expresses the same logic once and computes all three group memberships with a single formula. It is shorter and harder to let diverge.

The one non-obvious piece is the box index formula, and it is worth commenting in production code. One line of comment, and the formula pays for itself in readability.

Where this fits in the arrays-hashing series

This is problem eight in the series. The preceding problems established the foundational hash set and hash map patterns: Two Sum (1) for complement lookup, Valid Anagram (242) for frequency counting, Contains Duplicate (217) for the simplest set-membership check, Group Anagrams (49) for key-based bucketing. Valid Sudoku is the first problem that demands multiple simultaneous sets — one per group — rather than a single global structure. That is the conceptual step this problem adds.

Sudoku Solver (37) is the natural extension. Where this problem asks "is this board valid right now," Solver asks "can you fill in the blanks to make it valid everywhere." The implementation uses backtracking, and it calls a validity check at every placement step — a check that looks exactly like what we wrote here. Understanding 36 is genuinely prerequisite to 37.

Set Matrix Zeroes (73) is a different kind of multi-group matrix constraint: if a cell is zero, its entire row and column become zero. The coordination of "this cell affects its row and its column simultaneously" is the same structural idea as "this cell is checked against its row and column simultaneously."

N-Queens (51) generalizes the constraint structure. Rows and columns appear again, but you replace the 3x3 box with diagonals. The canonical-key for a diagonal is row - col or row + col. Same bucketing pattern, different key formula.

Unique Paths with Constraints (980) adds a twist: the constraint tracking becomes dynamic as you explore the board via backtracking. The group membership concept stays; what changes is that membership is built and torn down incrementally.

Part of the series: Arrays & Hashing

NEWSLETTER

Occasional notes on what I'm building

Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.

Comments

No comments yet. Be the first.