Design Twitter: merge k sorted timelines with a heap
This problem feels like a mini systems design question — build a Twitter clone with postTweet, follow, unfollow, and getNewsFeed. But once you strip out the product language, the core puzzle is: given up to 500 users, each holding a personal chronological timeline, produce the 10 most recent tweets across the requesting user and everyone they follow. That is a top-10 merge of k sorted lists. The data-structure toolbox you reach for determines whether getNewsFeed costs O(N log N) on total tweet volume or O(F log F) on the followee count.
The problem
Design a simplified version of Twitter where users can post tweets, follow or unfollow other users, and retrieve the 10 most recent tweets in their news feed — combining their own tweets with those of everyone they follow. Full statement: LeetCode 355.
Constraints:
- 1 <= userId, followerId, followeeId <= 500
- 0 <= tweetId <= 10^4
- All the tweets have unique IDs.
- At most 3 * 10^4 calls will be made to postTweet, getNewsFeed, follow, and unfollow.
- A user cannot follow himself.What the constraints force
1 <= userId, followerId, followeeId <= 500. At most 500 distinct users. A hash map keyed on user ID is the obvious choice. The bound also tells you the heap in the optimal solution never holds more than 500 entries — one per followee — so getNewsFeed is bounded by O(500 * log 500), a constant in practice.
0 <= tweetId <= 10^4. TweetId values are small, but the problem says each call to postTweet uses a unique tweetId. You cannot use the tweetId itself as a timestamp, because uniqueness within [0, 10^4] does not imply chronological ordering — tweet ID 3 could be posted after tweet ID 7. You need a separate monotonic counter.
At most 3 * 10^4 calls across all four operations. With up to 30,000 postTweet calls, a naive approach that sorts all N tweets on every getNewsFeed call costs O(N log N) per call. With 30,000 posts and some number of feed reads, that O(N log N) per-call cost adds up fast. The heap approach avoids it entirely: getNewsFeed only touches the head of each followee's timeline.
Approach 1: Brute force — global list and sort on every read
The simplest thing that works: keep one flat list of all tweets globally, tagged with (userId, tweetId, timestamp). On each getNewsFeed, filter by who the user follows (plus themselves), sort descending by timestamp, return the first 10.
from typing import List
class Twitter:
def __init__(self):
self.tweets: List[tuple] = [] # (userId, tweetId, timestamp)
self.follows: dict = {} # userId -> set of followeeIds
self.timestamp: int = 0
def postTweet(self, userId: int, tweetId: int) -> None:
self.tweets.append((userId, tweetId, self.timestamp))
self.timestamp += 1
def getNewsFeed(self, userId: int) -> List[int]:
followees = self.follows.get(userId, set()).copy()
followees.add(userId)
relevant = [
(tweet_id, ts)
for uid, tweet_id, ts in self.tweets
if uid in followees
]
relevant.sort(key=lambda x: x[1], reverse=True)
return [tweet_id for tweet_id, _ in relevant[:10]]
def follow(self, followerId: int, followeeId: int) -> None:
if followerId == followeeId:
return
self.follows.setdefault(followerId, set()).add(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
if followerId in self.follows:
self.follows[followerId].discard(followeeId)using System;
using System.Collections.Generic;
using System.Linq;
public class Twitter {
private List<(int UserId, int TweetId, int Timestamp)> tweets;
private Dictionary<int, HashSet<int>> follows;
private int timestamp;
public Twitter() {
tweets = new List<(int, int, int)>();
follows = new Dictionary<int, HashSet<int>>();
timestamp = 0;
}
public void PostTweet(int userId, int tweetId) {
tweets.Add((userId, tweetId, timestamp++));
}
public IList<int> GetNewsFeed(int userId) {
var followees = new HashSet<int>();
if (follows.ContainsKey(userId))
foreach (var f in follows[userId]) followees.Add(f);
followees.Add(userId);
return tweets
.Where(t => followees.Contains(t.UserId))
.OrderByDescending(t => t.Timestamp)
.Take(10)
.Select(t => t.TweetId)
.ToList();
}
public void Follow(int followerId, int followeeId) {
if (followerId == followeeId) return;
if (!follows.ContainsKey(followerId))
follows[followerId] = new HashSet<int>();
follows[followerId].Add(followeeId);
}
public void Unfollow(int followerId, int followeeId) {
if (follows.ContainsKey(followerId))
follows[followerId].Remove(followeeId);
}
}Complexity. postTweet: O(1). getNewsFeed: O(N log N) where N is total tweets ever posted. follow/unfollow: O(1). Space: O(N + F) where F is total follow relationships.
The problem is clear: as the tweet count grows, every getNewsFeed pays O(N log N) even if the user only follows three people. The brute force touches the entire history just to find 10 items. The data you need — the most recent tweet per followee — is buried in unsorted noise.
Approach 2: Per-user linked lists merged with a heap
The insight: tweets for a single user always arrive in chronological order. If you store each user's tweets as a linked list (newest at the head), you already have F sorted sequences. Merging F sorted sequences for the top-10 elements is exactly the "merge k sorted lists" pattern — and a max heap does it in O(F log F) time for the first 10 entries without touching most of the data.
The structure is:
user_tweets: hash map from userId to the head of their tweet linked list.follows: hash map from userId to a set of followeeIds.- A global monotonic
timestampcounter, incremented on eachpostTweet.
Each tweet node holds its tweet_id, timestamp, and a next pointer to the user's previous tweet (i.e., the next older one).
import heapq
from typing import List, Optional
class Tweet:
def __init__(self, tweet_id: int, timestamp: int):
self.tweet_id = tweet_id
self.timestamp = timestamp
self.next: Optional['Tweet'] = None
class Twitter:
def __init__(self):
self.user_tweets: dict = {} # userId -> Tweet (head)
self.follows: dict = {} # userId -> set of followeeIds
self.timestamp: int = 0
def postTweet(self, userId: int, tweetId: int) -> None:
new_tweet = Tweet(tweetId, self.timestamp)
self.timestamp += 1
new_tweet.next = self.user_tweets.get(userId)
self.user_tweets[userId] = new_tweet
def getNewsFeed(self, userId: int) -> List[int]:
followees = self.follows.get(userId, set()).copy()
followees.add(userId)
# Max heap by timestamp (negate for Python's min-heap)
heap = []
for fid in followees:
head = self.user_tweets.get(fid)
if head:
heapq.heappush(heap, (-head.timestamp, head))
result = []
while heap and len(result) < 10:
_, tweet = heapq.heappop(heap)
result.append(tweet.tweet_id)
if tweet.next:
heapq.heappush(heap, (-tweet.next.timestamp, tweet.next))
return result
def follow(self, followerId: int, followeeId: int) -> None:
if followerId == followeeId:
return
self.follows.setdefault(followerId, set()).add(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
if followerId in self.follows:
self.follows[followerId].discard(followeeId)using System;
using System.Collections.Generic;
public class Tweet {
public int TweetId;
public int Timestamp;
public Tweet Next;
public Tweet(int tweetId, int timestamp) {
TweetId = tweetId;
Timestamp = timestamp;
Next = null;
}
}
public class Twitter {
private Dictionary<int, Tweet> userTweets;
private Dictionary<int, HashSet<int>> follows;
private int timestamp;
public Twitter() {
userTweets = new Dictionary<int, Tweet>();
follows = new Dictionary<int, HashSet<int>>();
timestamp = 0;
}
public void PostTweet(int userId, int tweetId) {
var newTweet = new Tweet(tweetId, timestamp++);
if (userTweets.ContainsKey(userId))
newTweet.Next = userTweets[userId];
userTweets[userId] = newTweet;
}
public IList<int> GetNewsFeed(int userId) {
var followees = new HashSet<int>();
if (follows.ContainsKey(userId))
foreach (var f in follows[userId]) followees.Add(f);
followees.Add(userId);
// C# PriorityQueue (min-heap by default); negate timestamp for max behavior
var pq = new PriorityQueue<Tweet, int>();
foreach (var fid in followees) {
if (userTweets.TryGetValue(fid, out var head))
pq.Enqueue(head, -head.Timestamp);
}
var result = new List<int>();
while (pq.Count > 0 && result.Count < 10) {
var tweet = pq.Dequeue();
result.Add(tweet.TweetId);
if (tweet.Next != null)
pq.Enqueue(tweet.Next, -tweet.Next.Timestamp);
}
return result;
}
public void Follow(int followerId, int followeeId) {
if (followerId == followeeId) return;
if (!follows.ContainsKey(followerId))
follows[followerId] = new HashSet<int>();
follows[followerId].Add(followeeId);
}
public void Unfollow(int followerId, int followeeId) {
if (follows.ContainsKey(followerId))
follows[followerId].Remove(followeeId);
}
}Complexity. postTweet: O(1) — prepend to linked list. getNewsFeed: O(F log F) for the initial heap build plus O(10 log F) for the 10 extraction steps, so O(F log F) overall. follow/unfollow: O(1). Space: O(T + F) where T is total tweets and F is total follow relationships.
Hand trace through the example
The example sequence:
postTweet(1, 5) # timestamp=0, user 1 head: [5, t=0]
getNewsFeed(1) # followees = {1}, heap: [(-0, tweet_5)], pop -> [5]
follow(1, 2) # follows[1] = {2}
postTweet(2, 6) # timestamp=1, user 2 head: [6, t=1]
getNewsFeed(1) # followees = {1, 2}
unfollow(1, 2) # follows[1] = {}
getNewsFeed(1) # followees = {1}
The interesting one is getNewsFeed(1) after both posts and the follow:
followees = {1, 2}
Build heap:
user 1 head: tweet(id=5, t=0) -> push (-0, tweet_5)
user 2 head: tweet(id=6, t=1) -> push (-1, tweet_6)
heap = [(-1, tweet_6), (-0, tweet_5)] # min-heap, -1 is smallest so tweet_6 comes out first
Pop 1: tweet_6 (t=1, id=6) -> result = [6], tweet_6.next = None, nothing pushed
Pop 2: tweet_5 (t=0, id=5) -> result = [6, 5], tweet_5.next = None, nothing pushed
heap empty, len(result) = 2 < 10, loop ends.
Return [6, 5]
After unfollow(1, 2), getNewsFeed(1) sees only followees = {1}. The heap starts with just tweet_5, pops it, and returns [5]. Tweet_6 is never touched because user 2 is not in the followee set.
Edge cases that actually matter
User with no tweets calls getNewsFeed: the user_tweets.get(fid) check returns None, so no entry is pushed to the heap. The result list stays empty and is returned correctly. No crash.
User follows nobody: follows.get(userId, set()) returns an empty set; after adding userId to it, the followees set is just {userId}. The feed shows only the user's own tweets. Correct.
unfollow called for a relationship that does not exist: discard in Python and Remove in C# are no-ops on non-existent elements. Neither throws. The source's if ... in self.follows guard is an extra layer but discard is sufficient.
Fewer than 10 tweets total: the while heap and len(result) < 10 loop exits when the heap empties. The result is a partial list, which is exactly what the problem specifies ("retrieves the 10 most recent" — fewer is fine if fewer exist).
User ID repeated in followees set: adding userId to the followees set is safe because we start from a copy. The set deduplicates automatically — if someone somehow tried to follow themselves and it slipped through, the set would still only look up their timeline once.
Self-follow guard: the constraint says "A user cannot follow himself," and our follow implementation enforces this with an early return on followerId == followeeId. The heap logic would not break if this guard were missing — the user already appears in the feedset — but it would do duplicate work and potentially corrupt the follows map.
Comparison table
| Approach | postTweet | getNewsFeed | follow/unfollow | Space |
|---|---|---|---|---|
| Brute force (global list + sort) | O(1) | O(N log N) | O(1) | O(N + F) |
| Per-user linked lists + heap | O(1) | O(F log F) | O(1) | O(T + F) |
N = total tweets ever posted. F = number of followees of the requesting user (bounded by 500). T = total tweets. The heap approach decouples getNewsFeed cost from total tweet volume — a user who follows 3 people pays for 3, not for 30,000 historical tweets.
The underlying pattern and when to reach for it
The mental model here is merge k sorted sequences with a heap. Each followee's timeline is an already-sorted sequence (timestamps monotonically decrease as you traverse next pointers). You do not need to sort them against each other globally — you just need the next-largest element at any point, and a heap of k heads gives you that in O(log k) time.
This pattern appears whenever you have multiple pre-sorted streams and need the top-m elements across them without reading everything. The heap acts as a priority cursor: it always points at the current front of each stream, advances the winner, and repeats until you have enough.
The linked-list choice for each user's timeline is deliberate. Prepend is O(1). You never need random access into a user's history — you always start from the head and walk forward as needed. An array would work too (append + iterate backward), but the linked list makes the "walk forward from the head" intent explicit and avoids index arithmetic.
What I'd ship and why
The heap approach, without hesitation. The brute force is fine for a 5-tweet demo but it degrades linearly with total tweet volume — and tweet volume is exactly what grows without bound in production. The heap approach bounds getNewsFeed to O(F log F) where F is capped at 500 by the problem. At worst, 500 * log(500) ~ 4,500 operations per call. That is a constant.
I would also note that the C# PriorityQueue (available since .NET 6) handles priority inversion cleanly by negating the timestamp at push time, exactly matching the Python heapq pattern. If the runtime is older than .NET 6, a SortedSet with a custom comparer works but requires careful tie-breaking to avoid two entries with the same timestamp collapsing into one.
In an actual interview, I'd code the brute force first to confirm the interface is right, then propose and implement the heap upgrade with an explicit "here's why the first version can't scale."
Where this sits in the heap series
The heap series so far has covered static selection problems: Last Stone Weight (1046), Kth Largest Element (215), K Closest Points (973), Task Scheduler (621). This problem adds a new dimension — the heap is not holding final answers, it is acting as a merge cursor across multiple independent streams. That is a genuinely different use of the data structure.
Merge k Sorted Lists (23) is the pure algorithmic form of what getNewsFeed does. There is no class, no follow graph — just k linked lists and a heap. If you want to isolate the merge step from the bookkeeping, solve 23 first.
Find Median from Data Stream (295) (LeetCode 295) also holds two heaps as cursors, but over a single sorted order split at the median. The "heap as position tracker rather than answer holder" idea is the same; the split is different.
Twitter Clone (this problem) with getNewsFeed is really "Merge k Sorted Lists" with a follow-graph filter deciding which k streams to merge. Strip out follow/unfollow and the problem collapses to 23.
Smallest Range Covering Elements from K Lists (632) (LeetCode 632) is the hardest generalization: instead of top-10, you need the smallest window that contains at least one element from every list. Same heap-of-heads skeleton, much more complex stopping condition.
Part of the series: Heap / Priority Queue
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.
