Number of Islands: flood-fill, BFS, and when Union-Find actually earns its keep
The first time I saw this problem, my instinct was Union-Find. It looks like a textbook connected-components question, and Union-Find is the textbook connected-components structure. I coded it up, it passed, and I felt good about it for a week — until a teammate pointed out that a DFS version took about a third as many lines and performed identically. Sometimes the elegant tool is overkill.
The problem (LeetCode 200): given an m x n grid of '1' (land) and '0' (water) characters, count the number of islands. An island is a group of land cells connected horizontally or vertically. The grid edges are all water.
What the constraints actually force
1 <= m, n <= 300. The grid can be at most 300×300 = 90,000 cells. That's the budget.
An O(m×n) pass visits every cell exactly once. At 90,000 cells, that's trivially fast — sub-millisecond on any hardware. This rules out nothing and permits everything: DFS, BFS, Union-Find, they're all O(m×n) or better. The constraint doesn't push you toward any one approach.
grid[i][j] is '0' or '1'. Only two states. This is what lets us do the in-place marking trick: we can flip a visited land cell from '1' to '0', effectively absorbing it into the sea. No separate visited array needed, no extra O(m×n) space.
The implicit constraint the problem doesn't state but matters: connectivity is 4-directional (up, down, left, right), not 8-directional. Diagonal cells don't count as connected. This shapes the neighbor expansion in every approach.
DFS flood-fill
The core observation: every time you find a '1' you haven't visited, you've found the first cell of a new island. From that starting cell, a DFS can reach and mark every other cell on the same island. When the DFS returns, the entire island is gone — flipped to '0' — and the outer loop can continue hunting for the next unvisited '1'.
class Solution:
def numIslands(self, grid: list[list[str]]) -> int:
if not grid or not grid[0]:
return 0
rows, cols = len(grid), len(grid[0])
num_islands = 0
def dfs(r, c):
if r < 0 or r >= rows or c < 0 or c >= cols or grid[r][c] != '1':
return
grid[r][c] = '0'
dfs(r - 1, c)
dfs(r + 1, c)
dfs(r, c - 1)
dfs(r, c + 1)
for i in range(rows):
for j in range(cols):
if grid[i][j] == '1':
num_islands += 1
dfs(i, j)
return num_islandspublic class Solution {
public int NumIslands(char[][] grid) {
if (grid == null || grid.Length == 0) return 0;
int rows = grid.Length, cols = grid[0].Length;
int numIslands = 0;
void DFS(int r, int c) {
if (r < 0 || r >= rows || c < 0 || c >= cols || grid[r][c] != '1') return;
grid[r][c] = '0';
DFS(r - 1, c);
DFS(r + 1, c);
DFS(r, c - 1);
DFS(r, c + 1);
}
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
if (grid[i][j] == '1') {
numIslands++;
DFS(i, j);
}
}
}
return numIslands;
}
}The recursion terminates at boundary cells and at '0' cells. Every '1' cell the DFS touches gets flipped to '0', so it's never visited twice. The outer loop sees each cell exactly once (either '0' already, or it triggers a DFS that makes it '0'). Total work: O(m×n).
Recursive call stack depth: in the worst case, a single snake-shaped island that winds through all 90,000 cells. Stack depth would be 90,000 frames. That's fine in Python with a raised recursion limit (Python's default is 1,000, which would crash) and perfectly fine in C# (default stack is several MB). In an interview, mention this — it signals awareness that recursive DFS on a 300×300 grid needs a sanity check.
Time: O(m×n). Space: O(m×n) in the worst case from the recursive call stack.
Walking through Example 2 by hand
Example 2 from the problem, which has three islands:
["1","1","0","0","0"]
["1","1","0","0","0"]
["0","0","1","0","0"]
["0","0","0","1","1"]
The DFS traversal step by step:
After the DFS from (0,0) finishes, the top-left block is all '0'. The outer loop doesn't see cells (0,1), (1,0), (1,1) as triggers because they've been zeroed. It continues until (2,2), where the second island is a single cell. Then (3,3) and (3,4) together form the third.
BFS: same logic, explicit queue
BFS replaces the recursive call stack with an explicit queue. The island-counting structure is identical: every new '1' found by the outer loop starts a BFS that erases the whole island before continuing.
Why BFS over DFS? The honest answer: stack safety. Python's recursion limit is 1,000 by default. A 300×300 grid that's entirely one island would kill the DFS version without sys.setrecursionlimit. BFS uses a heap-allocated queue, so it handles any grid without blowing the call stack.
from collections import deque
class Solution:
def numIslands(self, grid: list[list[str]]) -> int:
if not grid or not grid[0]:
return 0
rows, cols = len(grid), len(grid[0])
num_islands = 0
directions = [(-1, 0), (1, 0), (0, -1), (0, 1)]
def bfs(start_r, start_c):
queue = deque([(start_r, start_c)])
grid[start_r][start_c] = '0'
while queue:
r, c = queue.popleft()
for dr, dc in directions:
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == '1':
grid[nr][nc] = '0'
queue.append((nr, nc))
for i in range(rows):
for j in range(cols):
if grid[i][j] == '1':
num_islands += 1
bfs(i, j)
return num_islandspublic class Solution {
public int NumIslands(char[][] grid) {
if (grid == null || grid.Length == 0) return 0;
int rows = grid.Length, cols = grid[0].Length;
int numIslands = 0;
int[,] directions = { { -1, 0 }, { 1, 0 }, { 0, -1 }, { 0, 1 } };
void BFS(int startR, int startC) {
var queue = new Queue<(int, int)>();
queue.Enqueue((startR, startC));
grid[startR][startC] = '0';
while (queue.Count > 0) {
var (r, c) = queue.Dequeue();
for (int i = 0; i < 4; i++) {
int nr = r + directions[i, 0];
int nc = c + directions[i, 1];
if (nr >= 0 && nr < rows && nc >= 0 && nc < cols && grid[nr][nc] == '1') {
grid[nr][nc] = '0';
queue.Enqueue((nr, nc));
}
}
}
}
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
if (grid[i][j] == '1') {
numIslands++;
BFS(i, j);
}
}
}
return numIslands;
}
}One subtle difference in the space analysis: the BFS queue at any moment holds the "frontier" of the island being explored. For a roughly square island, that frontier is proportional to its perimeter, which in the worst case (diagonal stripe) is O(min(m,n)). For a 300×300 all-land grid doing one BFS from a corner, the queue grows to at most ~300 entries at the widest diagonal cross-section — nowhere near the 90,000 frames the DFS call stack would need.
Time: O(m×n). Space: O(min(m,n)) for the BFS queue in the worst case.
Union-Find
Union-Find models each land cell as a node. Initially, each '1' cell is its own component. We scan the grid and union any pair of adjacent '1' cells. After processing the entire grid, the number of remaining components equals the number of islands.
The implementation uses path compression (during find) and union by rank to keep each find operation nearly constant — technically O(α(n)) where α is the inverse Ackermann function, which is ≤ 4 for any practical input size.
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.count = 0
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry:
return
if self.rank[rx] > self.rank[ry]:
self.parent[ry] = rx
elif self.rank[rx] < self.rank[ry]:
self.parent[rx] = ry
else:
self.parent[ry] = rx
self.rank[rx] += 1
self.count -= 1
class Solution:
def numIslands(self, grid: list[list[str]]) -> int:
if not grid or not grid[0]:
return 0
rows, cols = len(grid), len(grid[0])
uf = UnionFind(rows * cols)
uf.count = sum(row.count('1') for row in grid)
for i in range(rows):
for j in range(cols):
if grid[i][j] == '1':
if j + 1 < cols and grid[i][j + 1] == '1':
uf.union(i * cols + j, i * cols + j + 1)
if i + 1 < rows and grid[i + 1][j] == '1':
uf.union(i * cols + j, (i + 1) * cols + j)
return uf.countpublic class UnionFind {
private int[] parent, rank;
public int Count;
public UnionFind(int n) {
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) parent[i] = i;
}
public int Find(int x) {
if (parent[x] != x) parent[x] = Find(parent[x]);
return parent[x];
}
public void Union(int x, int y) {
int rx = Find(x), ry = Find(y);
if (rx == ry) return;
if (rank[rx] > rank[ry]) parent[ry] = rx;
else if (rank[rx] < rank[ry]) parent[rx] = ry;
else { parent[ry] = rx; rank[rx]++; }
Count--;
}
}
public class Solution {
public int NumIslands(char[][] grid) {
if (grid == null || grid.Length == 0) return 0;
int rows = grid.Length, cols = grid[0].Length;
var uf = new UnionFind(rows * cols);
foreach (var row in grid)
foreach (var cell in row)
if (cell == '1') uf.Count++;
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
if (grid[i][j] == '1') {
if (j + 1 < cols && grid[i][j + 1] == '1')
uf.Union(i * cols + j, i * cols + j + 1);
if (i + 1 < rows && grid[i + 1][j] == '1')
uf.Union(i * cols + j, (i + 1) * cols + j);
}
}
}
return uf.Count;
}
}Note: the Union-Find version only scans right and down neighbors (not all four directions). This is intentional — if you process cells left-to-right, top-to-bottom, then the left neighbor was already processed when you get to (i,j), and you would have already unioned that pair then. Only the right and bottom neighbors are "new."
Time: O(m×n × α(m×n)) ≈ O(m×n). Space: O(m×n) for the parent and rank arrays.
Complexity table
| Approach | Time | Space | Notes |
|---|---|---|---|
| DFS (recursive) | O(m×n) | O(m×n) | Call stack depth; watch Python recursion limit |
| BFS (queue) | O(m×n) | O(min(m,n)) | Heap queue; best space for wide islands |
| Union-Find | O(m×n × α(n)) | O(m×n) | Practically O(m×n); overkill here but extensible |
Edge cases that actually matter
Empty grid. grid = [] or grid = [[]]. Both DFS and BFS check not grid or not grid[0] up front and return 0. Easy.
All water. grid = [["0","0"],["0","0"]]. The outer loop never finds a '1', counter stays at 0. The DFS and BFS functions are never called.
All land. grid = [["1","1"],["1","1"]]. The first '1' at (0,0) triggers DFS/BFS that floods the entire grid. Counter increments to 1 and never again — every subsequent cell in the outer loop sees '0'. Correct.
Single cell. grid = [["1"]]. One cell, one island, trivially returns 1. grid = [["0"]] returns 0.
Non-rectangular-looking shapes. An L-shaped or U-shaped island is still one connected component. DFS/BFS handle this without any special logic because they recurse through all reachable neighbors. The shape doesn't matter — connectivity does.
Python recursion limit. For DFS on a 300×300 all-land grid, Python would need a recursion depth of up to 90,000. You'd need sys.setrecursionlimit(90001) or switch to BFS. In an interview in Python, I'd use BFS to avoid the question entirely.
Which one to ship
I'd reach for BFS in production Python and DFS in C# or Go. Here's the thinking.
In Python, the default recursion limit is 1,000. A 300×300 island hits that immediately. BFS dodges the problem with a deque — same O(m×n) runtime, better practical safety, and honestly not much harder to write once you've done it a few times. The space complexity advantage (O(min(m,n)) queue vs. O(m×n) call stack) is a real win for large all-land grids.
In C# (and most compiled languages), the call stack is several megabytes, and a 90,000-frame DFS is fine. The DFS code is shorter, the recursion reads like the description, and there's nothing to worry about. I'd write DFS.
Union-Find is the right answer only if the problem changes: if you need to handle incremental updates (cells flipping from water to land over time), or if you need to track island sizes or merge events, Union-Find scales cleanly. For a static grid with a simple count query, it's more code with no practical payoff.
The pattern underneath
This problem introduces the most fundamental graph pattern in the LeetCode catalog: connected components via flood-fill. You are given a set of nodes (cells) with implicit edges (adjacency), and you need to count how many disconnected groups exist.
The matrix encoding is the only novelty compared to a textbook graph problem. There's no explicit adjacency list — you derive neighbors on the fly from (r±1, c) and (r, c±1). Everything else maps directly: DFS/BFS over a graph, mark-visited to avoid cycles, count connected components.
Once you see this shape, you'll recognize it in a wide class of problems. The grid changes, the values change, the connectivity rule changes — but the underlying algorithm is always the same exploration.
Where this sits in the graphs series
Number of Islands is the entry point to graph traversal on grids. It teaches flood-fill cleanly because the problem is simple enough that the mechanics aren't obscured by anything else.
The natural next problems:
- LeetCode 695 — Max Area of Island. Same grid, same DFS structure, but instead of counting islands you return the size of the largest one. You thread a size counter through the DFS. The shape change is small; what you return changes.
- LeetCode 130 — Surrounded Regions. You mark
'O'cells that are not reachable from the border, then flip the rest. Requires thinking about which component the border belongs to — a good exercise in reversing the connectivity question. - LeetCode 1020 — Number of Enclaves. Count land cells that have no path to the grid boundary. You start BFS/DFS from every border land cell and subtract the visited count from the total. Same flood-fill, different accounting.
- LeetCode 694 — Number of Distinct Islands. Count islands that have different shapes. You need to canonicalize the path taken during DFS to get a shape signature. Union-Find won't help here; DFS with path recording is the right tool.
Each problem adds exactly one twist. Work through them in order and the pattern deepens with each step.
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.
