Clone Graph: deep-copying a cyclic structure without infinite loops
The problem feels simple until you think about cycles. You have a connected undirected graph — any number of nodes, each with a list of neighbors — and you need to return a completely independent copy. Same structure, same values, but no shared object references. The constraint that makes this hard: nodes point to each other, so a recursive clone of node 1 will eventually recurse back into node 1 and run forever. That's the whole problem. Everything else is bookkeeping.
The problem is LeetCode 133. The graph has at most 100 nodes and no self-loops or repeated edges.
What the constraints force
0 <= nodes <= 100. Small. This means any algorithm that visits each node and edge at most once — O(N + M) — is fast enough. There's no pressure toward a clever O(log N) approach. The interesting constraint is about the structure, not the scale.
Node.val is unique and in [1, 100]. This means you can use the node object itself as a hash map key (by identity, not value). In practice that's exactly what you do: original_node -> cloned_node. The uniqueness of values also means you could use an integer-keyed array of size 101 as a lookup table instead of a hash map, though it's rarely worth the ergonomic cost.
The graph is connected. Every node is reachable from the given starting node. You don't need to handle disconnected components — one traversal from node 1 is guaranteed to reach everything.
No self-loops, no repeated edges. This doesn't change the algorithm but it means you don't need to handle the case where a node is its own neighbor, and you won't encounter duplicate edge entries.
The graph has cycles. This is the constraint that forces everything. You cannot clone a graph with cycles using a plain recursive descent without tracking which nodes you've already cloned.
DFS with a hash map
The core idea: use a hash map to track which original nodes have already been cloned. Before recursing into a node's neighbors, create and register the clone. That registration is what breaks the cycle.
Here's what breaks if you don't do it in the right order:
clone(1) -> create clone_1 -> clone(2) -> create clone_2
-> clone(1) # 1 is already a neighbor of 2
-> create ANOTHER clone_1 # oops, infinite recursion
The fix is: create the clone and put it in the map before processing neighbors. Then any later recursive call that arrives at the same node finds the existing clone and returns it immediately, without recursing further.
class Solution:
def cloneGraph(self, node: 'Node') -> 'Node':
if not node:
return None
cloned = {}
def dfs(n):
if n in cloned:
return cloned[n]
clone = Node(n.val)
cloned[n] = clone # register before recursing
for neighbor in n.neighbors:
clone.neighbors.append(dfs(neighbor))
return clone
return dfs(node)public class Solution {
public Node CloneGraph(Node node) {
if (node == null) return null;
var cloned = new Dictionary<Node, Node>();
return Dfs(node, cloned);
}
private Node Dfs(Node node, Dictionary<Node, Node> cloned) {
if (cloned.ContainsKey(node))
return cloned[node];
var clone = new Node(node.val);
cloned[node] = clone; // register before recursing
foreach (var neighbor in node.neighbors)
clone.neighbors.Add(Dfs(neighbor, cloned));
return clone;
}
}The Python version closes over cloned in the nested function. The C# version passes it explicitly as a parameter — same logic, different scoping convention.
Tracing through the example
The canonical example: four nodes in a square with cross-edges omitted. Adjacency: 1-2, 1-4, 2-3, 3-4. Starting from node 1:
In prose, step by step through the neighbor lists:
dfs(1) -> create clone_1, register 1->clone_1
dfs(2) -> create clone_2, register 2->clone_2
dfs(1) -> found in map, return clone_1
dfs(3) -> create clone_3, register 3->clone_3
dfs(2) -> found in map, return clone_2
dfs(4) -> create clone_4, register 4->clone_4
dfs(1) -> found in map, return clone_1
dfs(3) -> found in map, return clone_3
clone_3.neighbors = [clone_2, clone_4]
clone_2.neighbors = [clone_1, clone_3]
dfs(4) -> found in map, return clone_4
clone_1.neighbors = [clone_2, clone_4]
Every node gets created exactly once. Every back-edge resolves to an already-existing clone. The resulting structure is isomorphic to the original.
BFS with a queue
BFS is a straightforward alternative. Instead of recursion, you maintain an explicit queue of original nodes to process. The hash map serves the same role — it tells you whether a neighbor has been cloned yet.
The key difference in code structure: with DFS, you append to clone.neighbors after the recursive call returns. With BFS, you append to clone.neighbors during the loop, relying on the fact that the clone was created (if not yet fully populated) when you first encountered it.
from collections import deque
class Solution:
def cloneGraph(self, node: 'Node') -> 'Node':
if not node:
return None
cloned = {node: Node(node.val)}
queue = deque([node])
while queue:
current = queue.popleft()
for neighbor in current.neighbors:
if neighbor not in cloned:
cloned[neighbor] = Node(neighbor.val)
queue.append(neighbor)
cloned[current].neighbors.append(cloned[neighbor])
return cloned[node]public class Solution {
public Node CloneGraph(Node node) {
if (node == null) return null;
var cloned = new Dictionary<Node, Node>();
cloned[node] = new Node(node.val);
var queue = new Queue<Node>();
queue.Enqueue(node);
while (queue.Count > 0) {
var current = queue.Dequeue();
foreach (var neighbor in current.neighbors) {
if (!cloned.ContainsKey(neighbor)) {
cloned[neighbor] = new Node(neighbor.val);
queue.Enqueue(neighbor);
}
cloned[current].neighbors.Add(cloned[neighbor]);
}
}
return cloned[node];
}
}Notice: the clone of neighbor is created on its first encounter (when first seen as a neighbor of any processed node), before neighbor itself has been dequeued. That's fine — we're only setting val at creation time. The neighbors list gets populated when neighbor is eventually dequeued. But we can already append cloned[neighbor] to the current node's neighbor list because the clone object exists.
Comparing the two approaches
| Approach | Time | Space | Notes |
|---|---|---|---|
| DFS recursive | O(N + M) | O(N) | Clean code; call stack depth up to N |
| BFS iterative | O(N + M) | O(N) | No stack risk; slightly more code |
Time is identical — every node and edge is processed exactly once in both cases. Space is also effectively identical: DFS uses O(N) for the hash map plus O(H) for the call stack where H is the longest path in the graph, while BFS uses O(N) for the hash map plus O(N) for the queue at its largest. At N=100 neither matters.
The practical difference is stack overflow risk. With N=100 and a maximum call depth of 100 frames, DFS is completely safe here. If you were cloning a graph with millions of nodes chained in a long path, the recursive DFS would overflow and BFS would not. Given the constraints, I'd ship the DFS version — it's shorter and the intent is more directly visible in the code.
Edge cases
Null input. node is null (Python: None). Both implementations check this first and immediately return null. The hash map never needs to be created.
Single node with no neighbors. adjList = [[]]. The DFS creates one clone, finds no neighbors to recurse into, and returns the clone. Correct.
Self-referential graph. The constraints say no self-loops, but consider what happens if a node did appear in its own neighbor list: the DFS would call dfs(node) while processing node's neighbors, find node already in the map, and return the existing clone. No infinite recursion. The registration-before-recursion invariant handles it.
Linear chain. Nodes form a path: 1-2-3-...-100. DFS recurses 100 levels deep. Fine at this scale. The chain has no cycles, so each node is encountered exactly once from one direction.
Complete graph (K_n). Every node connects to every other node. N=100 means each node has 99 neighbors. Total edges: N*(N-1)/2 ≈ 5,000. Still O(N + M), still fine. The hash map check at the start of each DFS call prevents any redundant work.
The mental model and where it generalizes
This problem is an instance of a general pattern: traversal with a memo table. You're doing a graph traversal (DFS or BFS), and you're building something as you go. The memo table (hash map from original nodes to clones) serves two purposes simultaneously: it prevents revisiting nodes (the usual visited-set role), and it maps old nodes to new ones so you can wire up edges in the clone.
The same pattern shows up in:
- Copy List with Random Pointer (LeetCode 138) — a linked list where each node has a
nextand arandompointer that can point anywhere. The random pointer makes it cyclic in the same structural sense. Same fix: hash map from original node to clone, two-pass or single-pass depending on the version. - Copy Binary Tree with Random Pointer (LeetCode 1485) — same idea on a tree, with the twist that random pointers create the cycle risk.
- Number of Islands (200) — no cloning, but the DFS traversal with a visited marker is structurally identical. The marker prevents re-entering cells the same way the hash map prevents re-entering nodes.
- Pacific Atlantic Water Flow (417) — multi-source BFS from two boundaries, building up a visited set for each. The queue management mirrors the BFS clone approach.
The series continues with Number of Islands and then Course Schedule — the latter adding a directed edge and requiring you to detect cycles explicitly rather than just avoid infinite traversal. Clone Graph is a good first graph problem because the cycle issue is mechanical: one hash map, one ordering rule. The harder problems require you to reason about what the cycles mean.
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.
