Median of Two Sorted Arrays: binary search on a partition, not a value
The constraint is O(log(m+n)). That one line rules out almost every approach you might reach for first — sorting, scanning, merging — and points you directly at binary search. But binary search on what? There is no value to search for here; you want the median, which is a position, not a target. The insight the problem is testing is that you can binary search on a partition point rather than a value. Everything else falls out from that.
The problem
Given two sorted arrays nums1 and nums2 of sizes m and n, return the median of the combined elements in O(log(m+n)) time. Full statement: LeetCode 4.
Constraints:
- nums1.length == m
- nums2.length == n
- 0 <= m <= 1000
- 0 <= n <= 1000
- 1 <= m + n <= 2000
- -10^6 <= nums1[i], nums2[i] <= 10^6What the constraints actually force
m and n are each at most 1000, and m + n >= 1. Both arrays are already sorted. Values span [-10^6, 10^6].
The sorted constraint is doing most of the work. A sorted array means you can binary search — not just for values, but for positions. If you pick a cut point in nums1, the corresponding cut point in nums2 is determined: together the left halves must contain exactly (m+n)/2 elements (or (m+n+1)/2 if you are handling odd totals). So you are only searching over one dimension even though you have two arrays. That collapses the search space from m * n down to O(min(m, n)).
The value bounds [-10^6, 10^6] have one practical consequence: you need sentinel infinities at the array boundaries. When a partition puts the cut at the very start or end of an array, there is no actual left maximum or right minimum — you need +inf and -inf as stand-ins. Both Python and C# make this clean.
Approach 1: Merge and find median
The straightforward reading of the problem — merge both sorted arrays, then pick the middle — is O(m+n) time and O(m+n) space. It is not what the problem asks for, but it is a correct baseline and worth implementing first to make sure you understand the median computation before adding binary search complexity on top.
The merge is the same two-pointer sweep from merge sort:
class Solution:
def findMedianSortedArrays(self, nums1, nums2):
merged = []
i, j = 0, 0
while i < len(nums1) and j < len(nums2):
if nums1[i] <= nums2[j]:
merged.append(nums1[i])
i += 1
else:
merged.append(nums2[j])
j += 1
while i < len(nums1):
merged.append(nums1[i])
i += 1
while j < len(nums2):
merged.append(nums2[j])
j += 1
n = len(merged)
if n % 2 == 1:
return float(merged[n // 2])
else:
return (merged[n // 2 - 1] + merged[n // 2]) / 2.0public class Solution {
public double FindMedianSortedArrays(int[] nums1, int[] nums2) {
var merged = new List<int>();
int i = 0, j = 0;
while (i < nums1.Length && j < nums2.Length) {
if (nums1[i] <= nums2[j]) merged.Add(nums1[i++]);
else merged.Add(nums2[j++]);
}
while (i < nums1.Length) merged.Add(nums1[i++]);
while (j < nums2.Length) merged.Add(nums2[j++]);
int n = merged.Count;
if (n % 2 == 1) return (double)merged[n / 2];
return (merged[n / 2 - 1] + merged[n / 2]) / 2.0;
}
}The odd/even split in the final step is the only detail that bites people: for an odd total the median is the single middle element at index n // 2; for an even total it is the average of indices n // 2 - 1 and n // 2. Get this wrong and every output is off by half a step.
Time: O(m + n). You touch every element once during the merge.
Space: O(m + n). The merged list holds all elements.
This fails the problem's time constraint. But it is the right first step — confirm the median logic on examples before moving to binary search.
Approach 2: Binary search on the partition point
Here is the core idea. You want to split the combined m + n elements into a left half and a right half such that every element in the left half is less than or equal to every element in the right half. Once that split exists, the median is determined: max(left) for odd totals, (max(left) + min(right)) / 2 for even totals.
You never have to find the split by trying all positions. Because both arrays are sorted, if you pick cut1 elements from nums1's left and cut2 elements from nums2's left (where cut1 + cut2 == half), then the only thing you need to check is whether the boundary elements cross over:
nums1[cut1 - 1] <= nums2[cut2](left max of nums1 does not exceed right min of nums2)nums2[cut2 - 1] <= nums1[cut1](left max of nums2 does not exceed right min of nums1)
If both hold, you have the right partition. If the first fails, cut1 is too big — slide it left. If the second fails, cut1 is too small — slide it right. That is a binary search loop.
Always binary search on the shorter array. That gives O(log(min(m,n))) iterations, and it ensures the corresponding cut on the longer array is always in bounds.
class Solution:
def findMedianSortedArrays(self, nums1, nums2):
# Always binary search on the shorter array
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
m, n = len(nums1), len(nums2)
total = m + n
half = total // 2 # left half target size
left, right = 0, m
while left <= right:
cut1 = (left + right) // 2
cut2 = half - cut1
# Boundary sentinels handle empty partitions cleanly
maxLeft1 = float('-inf') if cut1 == 0 else nums1[cut1 - 1]
minRight1 = float('inf') if cut1 == m else nums1[cut1]
maxLeft2 = float('-inf') if cut2 == 0 else nums2[cut2 - 1]
minRight2 = float('inf') if cut2 == n else nums2[cut2]
if maxLeft1 <= minRight2 and maxLeft2 <= minRight1:
# Correct partition found
if total % 2 == 1:
return float(min(minRight1, minRight2))
else:
return (max(maxLeft1, maxLeft2) + min(minRight1, minRight2)) / 2.0
elif maxLeft1 > minRight2:
right = cut1 - 1 # cut1 too far right
else:
left = cut1 + 1 # cut1 too far left
return 0.0 # unreachable with valid inputpublic class Solution {
public double FindMedianSortedArrays(int[] nums1, int[] nums2) {
// Always binary search on the shorter array
if (nums1.Length > nums2.Length) {
int[] tmp = nums1; nums1 = nums2; nums2 = tmp;
}
int m = nums1.Length, n = nums2.Length;
int total = m + n;
int half = total / 2;
int left = 0, right = m;
while (left <= right) {
int cut1 = (left + right) / 2;
int cut2 = half - cut1;
int maxLeft1 = cut1 == 0 ? int.MinValue : nums1[cut1 - 1];
int minRight1 = cut1 == m ? int.MaxValue : nums1[cut1];
int maxLeft2 = cut2 == 0 ? int.MinValue : nums2[cut2 - 1];
int minRight2 = cut2 == n ? int.MaxValue : nums2[cut2];
if (maxLeft1 <= minRight2 && maxLeft2 <= minRight1) {
if (total % 2 == 1)
return Math.Min(minRight1, minRight2);
return (Math.Max(maxLeft1, maxLeft2) + Math.Min(minRight1, minRight2)) / 2.0;
}
else if (maxLeft1 > minRight2)
right = cut1 - 1;
else
left = cut1 + 1;
}
return 0.0;
}
}Note the C# version uses int.MinValue and int.MaxValue as sentinels instead of float infinities. This works for the comparison logic, but watch: if total % 2 == 0 and maxLeft1 = int.MaxValue, the average computation overflows. In practice, the search terminates before hitting that case — but if you are being careful, use long for the average computation or the Python version's float approach.
Time: O(log(min(m,n))). Each iteration halves the search range on the shorter array.
Space: O(1). No auxiliary array; only a handful of integer variables.
Partition trace: Example 1, nums1 = [1,3], nums2 = [2]
Total = 3 (odd), half = 1. Binary search on nums1 (length 2).
The partition after position 1 in nums1 gives left = [1] from nums1, left = [] from nums2. The right halves are [3] and [2]. max(1, -inf) = 1, min(3, 2) = 2. Since total is odd we return min(minRight1, minRight2) = 2. Correct.
Partition trace: Example 2, nums1 = [1,2], nums2 = [3,4]
Total = 4 (even), half = 2. Binary search on nums1 (both length 2, does not matter).
Two iterations. The first cut (cut1=1) puts [1] from nums1 and [3] from nums2 on the left — but maxLeft2 = 3 > minRight1 = 2, which means nums2's left side has elements that are too large. Slide cut1 right. The second cut (cut1=2, cut2=0) takes all of nums1 on the left and nothing from nums2 on the left. Now maxLeft1 = 2, minRight2 = 3. Everything checks out, and the median is (2 + 3) / 2 = 2.5.
Edge cases that actually matter
One array is empty (nums1 = [], nums2 = [5]): After the swap, nums1 is still [] with m = 0. The binary search loop runs once with cut1 = 0, cut2 = half. The cut1 == 0 sentinel fires, maxLeft1 = -inf. The partition condition holds immediately. The median is computed from nums2 alone. Correct.
All elements of one array are smaller than all elements of the other (nums1 = [1,2], nums2 = [3,4]): The binary search converges to cut1 = 2, cut2 = 0, which is the "take all of nums1, none of nums2" partition. Example 2 above traces exactly this case.
Duplicate elements across arrays (nums1 = [1,1], nums2 = [1,1]): The partition condition uses <=, not <. Duplicates at the boundary are fine — maxLeft1 <= minRight2 holds when they are equal. The median computes correctly as 1.0.
Single-element arrays (nums1 = [1], nums2 = [2]): Total is 2 (even). One iteration: cut1 = 0 or cut1 = 1 depending on whether the single-element comparison passes. With cut1 = 0, cut2 = 1, maxLeft1 = -inf, minRight1 = 1, maxLeft2 = 2, minRight2 = +inf. Check: -inf <= +inf and 2 <= 1? No — slide right. cut1 = 1, cut2 = 0: maxLeft1 = 1, minRight1 = +inf, maxLeft2 = -inf, minRight2 = 2. Check: 1 <= 2 and -inf <= +inf. Yes. Median is (1 + 2) / 2 = 1.5.
Total is odd vs even: When total % 2 == 1, the median is the single middle element — min(minRight1, minRight2). When even, it is the average of the largest-left and smallest-right. Getting this formula wrong is the most common bug when adapting code from the even-only or odd-only case.
Comparison table
| Approach | Time | Space | When to reach for it |
|---|---|---|---|
| Merge + find median | O(m+n) | O(m+n) | Interviews where you demonstrate understanding before optimizing; tiny inputs |
| Binary search on partition | O(log(min(m,n))) | O(1) | Whenever constraints demand log time; this is the intended solution |
The merge approach is not just slower — it allocates an array as large as the union of both inputs. If m + n = 2000 that is fine. If you imagine scaling this to streaming data or very large arrays, the allocation is a genuine problem. The binary search version uses a fixed handful of variables regardless of input size.
The pattern this problem teaches
This is the clearest example in the binary-search series of what "binary search on the answer space" means as a technique. The value you are searching for is not in the array — it is the partition point itself. You define an invariant (the partition is valid when the two cross-boundary conditions hold), you have a monotone adjustment rule (slide cut1 left when maxLeft1 is too big, slide right otherwise), and the binary search converges.
That same structure appears in many problems that look nothing like a search problem at first: "find the minimum capacity such that you can ship all packages in D days," "find the smallest divisor such that the sum does not exceed a threshold," "find the kth smallest element in a sorted matrix." They all reduce to: define what valid means, binary search for the boundary, extract the answer.
I would ship the binary search version. The merge is a useful stepping stone to verify your median logic, but once you have it correct, there is no reason to keep it. The binary search is O(1) space and the constraint says O(log(m+n)) — it is what the problem is designed to reward.
Where this sits in the binary search series
This is the hardest problem in the series so far. The earlier problems — classic binary search, search in rotated sorted array, find minimum in rotated array, search a 2D matrix — all search for a value at a position. This one searches for a position that satisfies a condition about values. That inversion is the conceptual leap.
Find K-th Smallest Element in Two Sorted Arrays (LeetCode 373 variant, or the generalization of this problem): median is just the k = (m+n)/2-th element. The binary search generalizes directly to any k, not just the middle.
Kth Smallest Element in a Sorted Matrix (LeetCode 378): binary search on the value space, use a count function to check how many elements are <= mid. Different axis of search, same structure.
Find K-th Smallest Pair Distance (LeetCode 719): binary search on distance, count pairs with distance <= mid. The same "binary search on the answer, use a feasibility check" template.
Find Median from Data Stream (LeetCode 295): if elements arrive one at a time rather than in two static arrays, you need two heaps — a max-heap for the left half and a min-heap for the right half. The invariant is the same (left max <= right min), but the data structure is different because you cannot random-access a stream.
Part of the series: Binary Search
Occasional notes on what I'm building
Get an email when I publish a new post — engineering write-ups, no spam. Unsubscribe anytime.
Comments
No comments yet. Be the first.
