Word Ladder: shortest path through a graph you never explicitly built
You are not transforming words. You are walking a graph where every node is a word and every edge connects two words that differ by exactly one letter. The graph is never stored anywhere — it exists implicitly in the structure of wordList — but the problem is still just "find the shortest path from source to destination in an unweighted graph." BFS solves that. It always has.
The trick is that you build the neighborhood of each node on the fly: for a word of length M, you try all 26 * M single-letter substitutions and check whether each candidate lands in the dictionary. That is the whole algorithm.
The problem
Given two words beginWord and endWord and a dictionary wordList, find the number of words in the shortest transformation sequence from beginWord to endWord, where each step changes exactly one letter and every intermediate word must exist in wordList; return 0 if no such sequence exists. Full statement: LeetCode 127.
Constraints:
- 1 <= beginWord.length <= 10
- endWord.length == beginWord.length
- 1 <= wordList.length <= 5000
- wordList[i].length == beginWord.length
- beginWord, endWord, and wordList[i] consist of lowercase English letters.
- beginWord != endWord
- All the words in wordList are unique.What the constraints force
beginWord.length <= 10. This is the first thing that matters. Generating all neighbors of a word takes O(M * 26) work; at M = 10 that is 260 candidates per word — cheap. If M were in the hundreds, you would need a different neighbor-generation strategy (precompute wildcard patterns like *ot for hot). At M <= 10, brute-force character substitution is fine.
wordList.length <= 5000. That is the number of nodes in the graph. BFS visits each node at most once, so N = 5000 is entirely manageable. An O(N²) approach — checking all pairs of words for adjacency up front — would cost 25 million operations, which is also fine but unnecessary. The on-the-fly generation approach costs O(M * 26) per dequeue, not O(N), which is better when N is large relative to M * 26.
All words are lowercase English letters. The bounded alphabet (exactly 26 characters) means you loop over 'a' to 'z' with no special handling. No Unicode, no uppercase, no digits.
beginWord != endWord. This rules out the trivially-true case at the problem level. You do not need to check for it in your code, though you do need to check the more important early exit: if endWord is not in wordList, return 0 immediately, because no valid transformation sequence can end at a word that was never in the dictionary.
Approach 1: BFS on an implicit word graph
The canonical solution. Initialize a queue with (beginWord, 1) — the starting word and the current path length. Maintain a visited set to prevent revisiting words. At each step, dequeue a word, generate all 26 * M neighbors, and enqueue any neighbor that is in word_set and not yet visited. The moment you dequeue endWord, return its level. If the queue empties without finding it, return 0.
One correctness detail: mark a word as visited when you enqueue it, not when you dequeue it. If you wait until dequeue, you may add the same word to the queue multiple times from different parents, wasting O(N) extra enqueues at worst. The first path to reach a word is also the shortest (BFS property), so revisiting serves no purpose.
from collections import deque
from typing import List
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
if endWord not in wordList:
return 0
word_set = set(wordList)
queue = deque([(beginWord, 1)])
visited = {beginWord}
while queue:
current_word, level = queue.popleft()
if current_word == endWord:
return level
for i in range(len(current_word)):
for c in 'abcdefghijklmnopqrstuvwxyz':
new_word = current_word[:i] + c + current_word[i + 1:]
if new_word in word_set and new_word not in visited:
visited.add(new_word)
queue.append((new_word, level + 1))
return 0using System.Collections.Generic;
public class Solution {
public int LadderLength(string beginWord, string endWord, IList<string> wordList) {
var wordSet = new HashSet<string>(wordList);
if (!wordSet.Contains(endWord)) return 0;
var queue = new Queue<(string word, int level)>();
queue.Enqueue((beginWord, 1));
var visited = new HashSet<string> { beginWord };
while (queue.Count > 0) {
var (currentWord, level) = queue.Dequeue();
if (currentWord == endWord) return level;
char[] chars = currentWord.ToCharArray();
for (int i = 0; i < chars.Length; i++) {
char original = chars[i];
for (char c = 'a'; c <= 'z'; c++) {
chars[i] = c;
string newWord = new string(chars);
if (wordSet.Contains(newWord) && !visited.Contains(newWord)) {
visited.Add(newWord);
queue.Enqueue((newWord, level + 1));
}
}
chars[i] = original;
}
}
return 0;
}
}The C# version uses a char[] buffer instead of string concatenation, which matters slightly for performance since string is immutable in C# and concatenation allocates. In Python the current_word[:i] + c + current_word[i+1:] slice is idiomatic and fine.
Hand trace through Example 1
beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
BFS expansion level by level:
Level 1: dequeue "hit"
neighbors: "hot" in word_set -> enqueue ("hot", 2)
(all other substitutions miss word_set)
Level 2: dequeue "hot"
neighbors: "dot" in word_set -> enqueue ("dot", 3)
"lot" in word_set -> enqueue ("lot", 3)
Level 3: dequeue "dot"
neighbors: "dog" in word_set -> enqueue ("dog", 4)
dequeue "lot"
neighbors: "log" in word_set -> enqueue ("log", 4)
Level 4: dequeue "dog"
neighbors: "cog" in word_set -> enqueue ("cog", 5)
dequeue "log"
neighbors: "cog" already in visited -> skip
Level 5: dequeue "cog" -> current_word == endWord -> return 5
Result: 5. The path is hit -> hot -> dot -> dog -> cog.
Complexity
Time: O(M² * N). For each of the N words, we generate M * 26 candidate neighbors. Building each candidate costs O(M) (string slicing or char array copy). So: N * M * 26 * M = O(M² * N). In practice the BFS terminates early, but worst case is a long chain through all N words.
Space: O(M * N). The word_set holds N strings of length M each. The visited set is the same. The queue holds at most N entries, each an O(M) string.
Approach 2: Bidirectional BFS
BFS from one end explores a frontier that grows roughly as b^d where b is the branching factor (neighbors per word) and d is the shortest path length. Bidirectional BFS simultaneously expands from beginWord and endWord, meeting in the middle. Each side explores only b^(d/2), so the total work is roughly 2 * b^(d/2) instead of b^d. For large graphs or long paths, this is a significant practical speedup even though the asymptotic class is the same.
The implementation uses two sets — begin_set and end_set — instead of a queue. Each iteration, we expand one set by one level. The termination condition is: if a generated neighbor appears in the other set, the two frontiers have met, and the answer is level + 1.
One key optimization: always expand the smaller set. The larger set's size gives an upper bound on how many neighbors we will generate. By always picking the smaller side, we minimize the per-iteration work.
from typing import List
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
if endWord not in wordList:
return 0
word_set = set(wordList)
begin_set = {beginWord}
end_set = {endWord}
visited = set()
level = 1
while begin_set and end_set:
# Always expand the smaller frontier
if len(begin_set) > len(end_set):
begin_set, end_set = end_set, begin_set
next_set = set()
for word in begin_set:
for i in range(len(word)):
for c in 'abcdefghijklmnopqrstuvwxyz':
new_word = word[:i] + c + word[i + 1:]
if new_word in end_set:
return level + 1
if new_word in word_set and new_word not in visited:
visited.add(new_word)
next_set.add(new_word)
begin_set = next_set
level += 1
return 0using System.Collections.Generic;
public class Solution {
public int LadderLength(string beginWord, string endWord, IList<string> wordList) {
var wordSet = new HashSet<string>(wordList);
if (!wordSet.Contains(endWord)) return 0;
var beginSet = new HashSet<string> { beginWord };
var endSet = new HashSet<string> { endWord };
var visited = new HashSet<string>();
int level = 1;
while (beginSet.Count > 0 && endSet.Count > 0) {
// Always expand the smaller frontier
if (beginSet.Count > endSet.Count) {
var tmp = beginSet;
beginSet = endSet;
endSet = tmp;
}
var nextSet = new HashSet<string>();
foreach (var word in beginSet) {
char[] chars = word.ToCharArray();
for (int i = 0; i < chars.Length; i++) {
char original = chars[i];
for (char c = 'a'; c <= 'z'; c++) {
chars[i] = c;
string newWord = new string(chars);
if (endSet.Contains(newWord)) return level + 1;
if (wordSet.Contains(newWord) && !visited.Contains(newWord)) {
visited.Add(newWord);
nextSet.Add(newWord);
}
}
chars[i] = original;
}
}
beginSet = nextSet;
level++;
}
return 0;
}
}The bidirectional version does not track levels inside queue entries. Instead, level is an integer that increments once per full expansion round. When the frontiers meet (new_word in end_set), the answer is level + 1 because the current side has expanded level layers and the meeting word is one step into the other side's already-expanded territory.
Complexity
Time: O(M² * N) asymptotically — same bound — but with a smaller constant. In the worst case (no path exists), we still examine all words. In the average case with a valid path of length d, the bidirectional version explores roughly 2 * b^(d/2) nodes versus b^d for standard BFS. With b around 26 _ 10 = 260 and d = 5, that is 2 _ 260^2.5 ≈ 2.2Mversus260^5 ≈ 1.2 trillion — the bidirectional version is incomparably faster on deep graphs.
Space: O(M * N) — same class. Two frontier sets plus visited, each holding strings of length M.
Edge cases that bite
endWord not in wordList: return 0 immediately. This is the first check in both solutions. Skipping it means BFS will exhaust the entire dictionary trying to reach an unreachable target.
No path exists but endWord is in wordList: BFS (or bidirectional BFS) drains completely without the frontiers meeting. Return 0. Example: beginWord = "hit", wordList = ["hot","dot","dog","lot","log"] with no "cog" — even though there are valid intermediate words, you cannot reach endWord because endWord was not there in Example 2.
beginWord is already one substitution away from endWord: the shortest path has length 2. The first dequeue in BFS generates endWord as a neighbor and returns immediately. The bidirectional version returns level + 1 = 2 on the first expansion round (level starts at 1). Both handle this correctly without special-casing.
wordList = [endWord] only: if beginWord can reach endWord in one step, return 2. Otherwise return 0. No intermediate words means either there is a direct edge or there is nothing.
All words in wordList are identical to beginWord: every generated neighbor is beginWord itself or some variant, but since beginWord is in visited from initialization, nothing gets enqueued. If endWord cannot be reached through the substitutions, return 0. No infinite loops.
Marking visited at dequeue instead of enqueue: not technically wrong for correctness (BFS still finds shortest path), but catastrophically slow. In a dense graph, the same word can be enqueued from O(N) different parents before it is first dequeued. The queue blows up to O(N²) size. Always mark at enqueue time.
Which approach to write in an interview
I reach for standard BFS. It is 15-20 lines, the invariant is clear (BFS finds shortest path in unweighted graphs), and the implementation is easy to verify on the board. For N = 5000 and M = 10, standard BFS is fast enough — at most 5000 * 10 * 26 = 1.3M operations.
Bidirectional BFS is the answer when an interviewer asks "can you do better?" or when the graph is deep and wide enough that standard BFS is visibly slow. The key idea to state: you are cutting the exponentially growing frontier in half by attacking from both ends. The swap-smaller-set optimization is the detail that makes it work correctly in practice.
I would not ship bidirectional BFS in production without a comment explaining the level-tracking invariant. The level + 1 termination condition is the piece that trips people up — a careful comment saves the next reader ten minutes.
Comparison table
| Approach | Time | Space | When to use |
|---|---|---|---|
| BFS | O(M² * N) | O(M * N) | Default. Simple, fast enough for the given constraints. |
| Bidirectional BFS | O(M² * N) asymptotically, much faster in practice | O(M * N) | Deep paths, large graphs where standard BFS is visibly slow. |
The asymptotic class is the same. The practical difference grows exponentially with path length.
The transferable pattern: implicit graph BFS
Word Ladder is a template for a class of problems where the graph is never stored — only the rule for deriving neighbors matters. The state is a node; the transition rule is the edge function; BFS finds the shortest sequence of transitions.
Recognizing this pattern is the real skill. Whenever you see "minimum steps," "shortest sequence," or "minimum transformations" with a well-defined step rule, the answer is almost always BFS on an implicit state graph.
Where this fits in the graphs series
The four articles before this — Number of Islands, Clone Graph, Course Schedule, and Rotting Oranges — all operated on explicitly given graphs: adjacency lists, grids with direct coordinates, or prebuilt matrices. Word Ladder is the first problem here where the graph structure must be derived on the fly from the input domain. That is the conceptual step up.
The neighbors of "hot" are not listed anywhere in the input. You generate them by systematically trying every single-character substitution. This is the difference between "traverse a given graph" and "model your problem as a graph and then traverse it."
Minimum Genetic Mutation (433) is the same problem, same structure, with a four-character alphabet (A, C, G, T) and a gene bank instead of a word list. If you can solve Word Ladder, 433 is essentially free.
Open the Lock (752) applies the same implicit-graph BFS to a combination lock. The "words" are four-digit strings, each "character" can rotate up or down, and there are dead ends to avoid. The branching factor is 8 (four positions, two directions each), and the state space is 10^4 = 10000. Pure BFS, same pattern.
Word Ladder II (126) asks for all shortest paths, not just the length. This changes the problem significantly — you cannot stop at the first arrival at endWord, you must track all parent pointers and reconstruct every shortest path. The BFS structure is similar but the bookkeeping is substantially heavier. If 127 feels comfortable, 126 is the natural next challenge.
Shortest Path with Alternating Colors (1129) adds a constraint on the edge type (each step must alternate between red and blue edges), which requires the state to encode not just the current node but also the last edge color used. Same implicit-graph BFS, but the state is (node, last_color) instead of just node.
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.
