Serialize and Deserialize Binary Tree: encoding structure, not just values
The problem gives you complete freedom in how you encode the tree. That freedom is the trap. The naive instinct is to record just the values — but values alone are not enough to reconstruct a unique tree. The string "1 2 3" could describe a root with two children, or a skewed chain, or several other shapes. You need to encode the structure too, and the question is how.
Two natural encodings fall out of the standard traversal strategies: BFS level-order (the same layout LeetCode uses in its own examples) and DFS preorder (the recursive approach that mirrors how you build a tree from a recursive description). Both work. Both are O(N) time and space. But they differ in string length, decode complexity, and stack behavior under the constraints of this problem.
The problem
Design an algorithm to serialize a binary tree to a string and deserialize that string back to the original tree structure. There is no restriction on how your serialization and deserialization algorithm should work — you just need to ensure the round-trip is lossless. Full statement: LeetCode 297.
Constraints:
- The number of nodes in the tree is in the range [0, 10^4].
- -1000 <= Node.val <= 1000What the constraints force
The tree has [0, 10^4] nodes, values in [-1000, 1000]. Neither number is scary on its own, but together they establish a few things worth naming before you write code.
Node count up to 10^4 rules out any O(N^2) approach — though with a single traversal you are never in danger of that here. More importantly for the DFS solution, 10^4 means the recursion stack can go 10^4 frames deep in a fully skewed tree. Python's default limit is 1000. That is a real concern. In a competitive or production setting you either set sys.setrecursionlimit or switch to an explicit stack for the DFS version. The BFS solution sidesteps this entirely.
Values in [-1000, 1000] include negatives and zero. The sign is part of the value; your delimiter choice must not collide with the minus sign. Comma works cleanly because node values are integers with no comma in their string representation. The hash # and null string are both fine null markers for the same reason.
The empty tree (N = 0) is in range. Both approaches must handle the case where root is None/null from the start.
Approach 1: BFS level-order
The level-order encoding writes values row by row, left to right, using null to represent absent children. It is exactly LeetCode's own format for test cases. The key insight for the decode is: if a node appears at position i in the value list, its left child is at position 2i+1 and its right child at 2i+2 — but that index trick only works for complete trees. Instead, the decode uses a queue: each node dequeued claims the next two values as its children.
from collections import deque
class TreeNode:
def __init__(self, x):
self.val = x
self.left = None
self.right = None
class Codec:
def serialize(self, root: TreeNode) -> str:
if not root:
return ""
result = []
queue = deque([root])
while queue:
node = queue.popleft()
if node:
result.append(str(node.val))
queue.append(node.left)
queue.append(node.right)
else:
result.append("null")
# trim trailing nulls -- they are not needed for reconstruction
while result and result[-1] == "null":
result.pop()
return ",".join(result)
def deserialize(self, data: str) -> TreeNode:
if not data:
return None
values = data.split(",")
root = TreeNode(int(values[0]))
queue = deque([root])
i = 1
while queue and i < len(values):
node = queue.popleft()
if i < len(values) and values[i] != "null":
node.left = TreeNode(int(values[i]))
queue.append(node.left)
i += 1
if i < len(values) and values[i] != "null":
node.right = TreeNode(int(values[i]))
queue.append(node.right)
i += 1
return rootusing System;
using System.Collections.Generic;
public class Codec {
public string serialize(TreeNode root) {
if (root == null) return "";
var result = new List<string>();
var queue = new Queue<TreeNode>();
queue.Enqueue(root);
while (queue.Count > 0) {
TreeNode node = queue.Dequeue();
if (node != null) {
result.Add(node.val.ToString());
queue.Enqueue(node.left);
queue.Enqueue(node.right);
} else {
result.Add("null");
}
}
// trim trailing nulls
while (result.Count > 0 && result[result.Count - 1] == "null")
result.RemoveAt(result.Count - 1);
return string.Join(",", result);
}
public TreeNode deserialize(string data) {
if (string.IsNullOrEmpty(data)) return null;
string[] values = data.Split(',');
TreeNode root = new TreeNode(int.Parse(values[0]));
var queue = new Queue<TreeNode>();
queue.Enqueue(root);
int i = 1;
while (queue.Count > 0 && i < values.Length) {
TreeNode node = queue.Dequeue();
if (i < values.Length && values[i] != "null") {
node.left = new TreeNode(int.Parse(values[i]));
queue.Enqueue(node.left);
}
i++;
if (i < values.Length && values[i] != "null") {
node.right = new TreeNode(int.Parse(values[i]));
queue.Enqueue(node.right);
}
i++;
}
return root;
}
}Walking through the example
Tree [1, 2, 3, null, null, 4, 5]:
Serialization, level by level:
Queue: [1]
Dequeue 1 -> result: ["1"], enqueue 2, 3
Queue: [2, 3]
Dequeue 2 -> result: ["1","2"], enqueue null, null
Dequeue 3 -> result: ["1","2","3"], enqueue 4, 5
Queue: [null, null, 4, 5]
Dequeue null -> result: [...,"null"]
Dequeue null -> result: [...,"null"]
Dequeue 4 -> result: [...,"4"], enqueue null, null
Dequeue 5 -> result: [...,"5"], enqueue null, null
Queue: [null, null, null, null]
All four -> "null","null","null","null"
Before trim: "1,2,3,null,null,4,5,null,null,null,null"
After trim: "1,2,3,null,null,4,5"
Deserialization of "1,2,3,null,null,4,5":
values = ["1","2","3","null","null","4","5"]
root = TreeNode(1), queue = [1], i = 1
Dequeue 1: values[1]="2" -> 1.left=2, queue=[2]; i=2
values[2]="3" -> 1.right=3, queue=[2,3]; i=3
Dequeue 2: values[3]="null"-> skip; i=4
values[4]="null"-> skip; i=5
Dequeue 3: values[5]="4" -> 3.left=4, queue=[4]; i=6
values[6]="5" -> 3.right=5, queue=[4,5]; i=7
Dequeue 4: i=7 >= length, loop ends
Result: the original tree reconstructed.
Complexity
Time: O(N) for both serialize and deserialize. Every node is enqueued and dequeued exactly once.
Space: O(N) for the queue and the output string. In the widest level of a complete tree, the queue holds O(N/2) entries.
Approach 2: DFS preorder
Preorder traversal visits root before its children. The crucial property this gives you is that when you deserialize, each value you read is exactly the root of the next subtree to build. You do not need bookkeeping about which child slot you are filling — the recursive structure handles it automatically.
The null marker here is #. When the deserializer reads #, it returns None immediately and the parent gets back a null child. No look-ahead, no queue-based bookkeeping.
class Codec:
def serialize(self, root: TreeNode) -> str:
parts = []
def preorder(node):
if not node:
parts.append("#")
return
parts.append(str(node.val))
preorder(node.left)
preorder(node.right)
preorder(root)
return ",".join(parts)
def deserialize(self, data: str) -> TreeNode:
it = iter(data.split(","))
def build():
val = next(it)
if val == "#":
return None
node = TreeNode(int(val))
node.left = build()
node.right = build()
return node
return build()using System;
using System.Collections.Generic;
public class Codec {
public string serialize(TreeNode root) {
var parts = new List<string>();
void Preorder(TreeNode node) {
if (node == null) {
parts.Add("#");
return;
}
parts.Add(node.val.ToString());
Preorder(node.left);
Preorder(node.right);
}
Preorder(root);
return string.Join(",", parts);
}
public TreeNode deserialize(string data) {
string[] values = data.Split(',');
int index = 0;
TreeNode Build() {
if (index >= values.Length || values[index] == "#") {
index++;
return null;
}
var node = new TreeNode(int.Parse(values[index++]));
node.left = Build();
node.right = Build();
return node;
}
return Build();
}
}The preorder trace
Same tree [1, 2, 3, null, null, 4, 5]. Preorder visits: 1, then the entire left subtree, then the entire right subtree.
Serialize output: "1,2,#,#,3,4,#,#,5,#,#"
Deserialize trace — the iterator reads left to right, and the recursive calls consume values in the same preorder sequence:
build(): read "1" -> node=1
node.left = build(): read "2" -> node=2
node.left = build(): read "#" -> return None
node.right = build(): read "#" -> return None
return node=2
node.right = build(): read "3" -> node=3
node.left = build(): read "4" -> node=4
node.left = build(): read "#" -> return None
node.right = build(): read "#" -> return None
return node=4
node.right = build(): read "5" -> node=5
node.left = build(): read "#" -> return None
node.right = build(): read "#" -> return None
return node=5
return node=3
return node=1
The iterator is consumed exactly once, in the same order the serializer wrote it. That symmetry is why this works without any queue or index bookkeeping.
Complexity
Time: O(N). Each node produces exactly one value token in serialize; each token is consumed exactly once in deserialize.
Space: O(N) for the output string. Call stack depth is O(H), where H is the height of the tree. For a balanced tree this is O(log N). For a skewed tree it is O(N) -- and with 10^4 nodes, that is 10^4 stack frames, which exceeds Python's default limit of 1000. The C# implementation also uses recursive calls and will hit the same concern at scale.
This is the one place where the DFS approach has a real practical weakness that BFS does not share.
Edge cases that matter
Empty tree. root = None / null. The BFS serializer returns "" immediately; the BFS deserializer returns None on empty string. The DFS serializer outputs "#" for the null root; the DFS deserializer reads "#" and returns None. Both round-trip correctly. Test this first.
Single node. root = [5]. BFS encodes "5", decodes to a node with no children. DFS encodes "5,#,#" -- the two # markers for left and right children. Both decode back to the original single node. The extra markers in DFS are structural information, not noise.
Left-skewed or right-skewed chain. A 10^4-node chain is the DFS recursion killer. BFS processes it iteratively with a queue that never grows beyond one entry at a time (since each node has at most one child). DFS needs a call stack 10^4 frames deep. If you care about worst-case behavior within the given constraints, BFS wins here.
Negative values. -1000 serializes as "-1000". The comma delimiter does not conflict with the minus sign. Parsing with int() or int.Parse() handles the sign correctly. No special case needed.
Values that look like the null marker. Your null marker ("null" or "#") must not be a valid integer. Both choices are safe: no integer serializes to "null" or "#".
Trailing nulls in BFS. The trimming step in the BFS serializer is an optimization, not a correctness requirement. The deserializer works fine if trailing nulls are present, because the loop condition i < len(values) prevents overrun, and "null" values simply cause the if values[i] != "null" check to skip. Trimming keeps the encoded string shorter.
Comparison
| Approach | Serialize | Deserialize | String length | Stack risk at 10^4 nodes |
|---|---|---|---|---|
| BFS level-order | O(N) | O(N) | Shorter (trims trailing nulls) | None (iterative) |
| DFS preorder | O(N) | O(N) | Longer (explicit null markers for every leaf) | Real (recursive, 10^4 frames) |
Both are linear. The space for the encoded string is O(N) in both cases: BFS writes at most 2N-1 tokens before trimming, DFS writes exactly 2N+1 tokens (N values plus N+1 null markers for a full binary tree). In practice the BFS string after trimming is shorter for sparse trees; the DFS string can be longer because it marks every leaf's null children.
Which one to ship
I would ship the DFS preorder version for interviews and for most real use cases, with a note about recursion depth. The reason is the symmetry between encode and decode: the preorder serializer and the iterator-driven deserializer are structurally identical -- one writes, one reads, both follow the same traversal order. That makes the pair easy to reason about and easy to verify correct. The BFS version requires you to think about how the queue in the deserializer mirrors the queue in the serializer, and the index bookkeeping is a source of off-by-one errors under pressure.
The one situation where I reach for BFS first is when the problem explicitly mentions very deep trees or when stack depth is a stated constraint. 10^4 nodes in a skewed tree is right at Python's default limit. If the problem said 10^5 or 10^6, BFS is not optional.
The underlying pattern and where it shows up
The design insight here is structural encoding: you cannot encode a tree from values alone because the same multiset of values can describe different trees. You need either an explicit structure signal (null markers, as in DFS preorder) or a traversal order that is self-delimiting (level-order with a positional queue). Both strategies appear in serialization problems everywhere -- not just trees.
The same pattern drives these neighbors in the trees series:
Serialize and Deserialize BST (449) adds the BST property: left children are smaller, right children are larger. That constraint makes null markers optional -- you can reconstruct the tree from preorder values alone by knowing where left and right subtrees must split. The encoding is shorter but the decode is more work.
Serialize and Deserialize N-ary Tree (428) extends the design to trees where each node can have any number of children. The preorder approach generalizes by encoding the child count at each node. The same structural-encoding insight applies; you just need more information per node.
Find Duplicate Subtrees (652) uses serialization as a tool rather than the goal. You serialize each subtree to a canonical string and use a hash map to detect repeats. The DFS preorder encoding with null markers works directly as the canonical form.
Construct Binary Tree from Preorder and Inorder Traversal (105) is the inversion: given two traversal sequences (without null markers), reconstruct the tree. It shows why preorder alone is insufficient without structural markers -- you need inorder to tell you where the left subtree ends. Problem 297's preorder-with-nulls encoding avoids this ambiguity by making the structure explicit.
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.
