LC4. Median of Two Sorted Arrays¶
Problem Description¶
LeetCode Problem 4: Given two sorted arrays nums1
and nums2
of size m
and n
respectively, return the median of the two sorted arrays.
The overall run time complexity should be O(log (m+n))
.
Clarification¶
- sorted vs. unsorted?
- different sizes?
- any duplicates?
- median definition?
- data type?
- time complexity requirement?
Assumption¶
- At least one array is not emepty
Solution¶
Start from median definition to find solutions.
Median definition: the median is the "middle" of a sorted list of numbers. \(\text{median}(x) = (x \lfloor (n+1)/2 \rfloor + x \lceil (n+1)/2 \rceil)/2\), where \(x\) is an ordered list of \(n\) numbers, and \(\lfloor \cdot \rfloor\) and \(\lceil \cdot \rceil\) denotes the floor and ceiling functions, respectively.
- If n is odd, the median is \((n + 1)/2\) th element.
- If n is even, the median is the average of \((n + 1)/2\) th and \((n + 1)/2 + 1\) th elements
Approach - Two Pointers¶
Have two pointers for each array. Move the pointer with smaller value forward at each step. Continue moving the pointers until reaching half of the total number of elements. Based on median definition, calculate median value.
class Solution:
def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
m, n = len(nums1), len(nums2)
i, j = 0, 0
value = 0
for count in range(0, (m + n) // 2 + 1):
value_prev = value
if i >= m:
value = nums2[j]
j += 1
elif j >= n:
value = nums1[i]
i += 1
else:
if nums1[i] <= nums2[j]:
value = nums1[i]
i += 1
else:
value = nums2[j]
j += 1
if (m + n) % 2 == 1:
return value
else:
return (value + value_prev) / 2
Complexity Analysis¶
- Time complexity: \(O(m + n)\)
Since moving pointer one step per execution, it takes \((m + n) // 2\) steps to find the median. So the time complexity is \(O(m + n)\). - Space complexity: \(O(1)\)
Use several variables.
Approach - Find k-th smallest¶
Based on the median definition, we can convert the problem to find \((m + n + 1)/2\)-th smallest elements, where \(m\) and \(n\) are total number of array 1 and array 2, respectively.
- Idea: In every iteration, compare first k/2 elements of array A with the first k/2 elements of array B and delete k/2 elements.
- How to reduce search size? k -> k/2 -> (k - k/2)/2 ... -> 1
- How to delete k/2? If
a[aLeft + k/2 - 1] <= b[bLeft + k/2 - 1]
, delete elements froma[aLeft]
toa[aLeft + k/2 - 1]
sincea[aLeft + k/2 - 1]
is smaller than the k-th smallest element and therefore the k-th smallest element can not be amonga[aLeft]
toa[aLeft + k/2 - 1]
.aLeft
is the array A's left border to consider from.bLeft
is the array B's left border to consider from. Note the total number of elements in comparison is k (k/2 elements fromaLeft
toaLeft + k/2 - 1
and k/2 elements frombLeft
toaLeft + k/2 - 1
). if k-th smallest elements is amonga[aLeft]
toa[aLeft + k/2 - 1]
(assuminga[m]
), it means that there shouldn't be any element that is larger thana[m]
. However, at leastb[bLeft + k/2 - 1]
is larger thana[m]
sinceb[bLeft + k/2 - 1] >= a[aLeft + k/2 - 1] >= a[m]
. -
Base cases:
- If
aLeft > a.length
(finish searching array a), returnb[bLeft + k - 1]
, the k-th smallest element (must in b). - If
bLeft > b.length
(finish searching array b), returnb[aLeft + k - 1]
, the k-th smallest element (must in a). - If
k == 1
, the k-th smallest elements should be the smaller one ofa[aLeft]
andb[bLeft]
. Remember the search sapce is reduced from k -> k/2 -> ... -> 1. The k-th smallest elements should be found whenk == 1
- If
-
Corner cases:
- If
a.length
is too small (doesn't have k/2 elements), remove elements from b first until a has k/2 elements after reducing search space (k). - If
b.length
is too small ((doesn't have k/2 elements)), remove elements from a first until b has k/2 elements after reducing search space (k).
- If
class Solution {
public double findMedianSortedArrays(int[] nums1, int[] nums2) {
// if (nums1 == null || nums1.length == 0 || nums2 == null || nums2.length == 0)
// return Double.NaN;
int k = (nums1.length + nums2.length + 1)/2;
if ((nums1.length + nums2.length)%2 == 0)
return ((double) findKSmallest(nums1, nums2, k) + (double) findKSmallest(nums1, nums2, k+1))/2;
else
return findKSmallest(nums1, nums2, k);
}
public int findKSmallest(int[] nums1, int[] nums2, int k) {
int n1 = nums1.length;
int n2 = nums2.length;
int i = 0; // starting index of nums1 array
int j = 0; // starting index of nums2 array
while (k >= 1) {
// Eliminate k elements based on comparison results
if (i >= n1) { // finish searching all elments in nums1
return nums2[j + k - 1];
}
else if (j >= n2) { // finish searching all elments in nums1
return nums1[i + k - 1];
}
else if (k == 1)
return nums1[i] <= nums2[j] ? nums1[i] : nums2[j];
else if (i + k/2 - 1 >= n1) // not enough elements in nums1
j = j + k/2;
else if (j + k/2 - 1 >= n2) // not enought elements in nums2
i = i + k/2;
else if (nums1[i + k/2 - 1] <= nums2[j + k/2 - 1])
i = i + k/2;
else
j = j + k/2;
k = k - k/2; // update k to search in next iteration
}
return 0;
}
}
Complexity Analysis¶
- Time complexity: \(O(\log (m+n))\)
Since binary search is used to find \((m + n + 1)/2\) th smallest element, the time complexity is \(O(\log ((m + n + 1)/2)) \rightarrow O(\log (m+n))\) - Space complexity: \(O(1)\) as using several internal variables.
Approach - Binary Search with Partition¶
Median property: the median is the value separating the higher half from the lower half.
References:
Based on the the use of median for dividing, we can find partition position at two arrays and obtain the median value: 1. Cut A into two parts at a random position i:
Since A has m elements, so there are m + 1 cuttings (i = 0 ~ m). left_A length is i and right_A length is m - i. Note wheni = 0
, left_A is empty, and when i = m
, right_A is empty.
-
Similarly cut B into two parts at a random position j:
Since B has n elements, left_B length is j and right_B length is m - j. Note whenj = 0
, left_B is empty, and whenj = n
, right_B is empty. -
Put left_A and left_B into one set, left_part, and put right_A and right_B into another set, right_part:
If we can ensure: -
len(left_part) == len(right_part)
for even number of (m+n), orlen(left_part) == len(right_part) + 1
for odd number of (m+n)
i + j = m - i + n - j
whenm + n
is even;i + j = m - i + n - j + 1
whenm + n
is odd. ifn >= m
, we can simplifyi = 0 ~ m
andj = (m + n + 1)/2 - i
. Note(m + n + 1)/2
works for both odd and even case since integer divide results are the same between \(\frac{m+n}{2}\) and \(\frac{m+n+1}{2}\). max(left_part) <= min(right_part)
Just checkB[j - 1] <= A[i]
andA[i - 1] <= B[j]
. Since A and B are sorted,A[i - 1] <= A[i]
andB[j - 1] <= B[j]
and therefore no need to compare.
Then the median can compute from the middle 4 elements (A[i-1]
, B[j-1]
, A[i]
, and B[j]
).
max(A[i - 1], B[j - 1])
, when m + n is odd(max(A[i -1], B[j - 1]) + min(A[i], B[j]))/2
, when m + n is even
The algorithm is simplified as:
Searching i in [0, m] to find an object i such that:
B[j - 1] <= A[i] and A[i - 1] <= B[j], where j = (m + n + 1)/2 - i
B[j - 1] <= A[i]
andA[i - 1] <= B[j]
The object is found and stop searching.B[j - 1] > A[i]
Means A[i] is too small. We need to adjust i to getB[j - 1] <= A[i]
.- Can we increase i?
Yes, Because when i is increased (A[i] is increased), j will be decreased (B[j - 1] is decreased). So the search range is changed to [i + 1, right]. - Can we decrease i?
No! Because when i is decreased (A[i] is decreased further), j will be increased (B[j - 1] is increased further). Therefore,B[j - 1] <= A[i]
will be never satisfied.
- Can we increase i?
A[i - 1] > B[j]
Means A[i - 1] is too big. And we must decrease i to getA[i - 1] <= B[j]
. So the search range is changed to [left, i - 1].
Edge Case: A[i - 1]
doesn't exist when i == 0
; A[i]
doesn't exist when i == m
; A[j - 1]
doesn't exist when j == 0
, A[j]
doesn't exist when j == n
. We can ignore non-exist elements in the calculation and comparison.
class Solution {
public double findMedianSortedArrays(int[] nums1, int[] nums2) {
if ((nums1 == null || nums1.length == 0) && (nums2 == null || nums2.length == 0))
return Double.NaN;
if (nums1.length > nums2.length)
return findMedianSortedArrays(nums2, nums1);
int m = nums1.length;
int n = nums2.length;
int left = 0;
int right = m; // not m - 1, since this is partition position, which can be 0 ~ m (inclusive)
int i; // partition position for array nums 1
int j; // partition position for array nums 2
// Binary search of partition position
while (left <= right) {
i = left + (right - left)/2;
j = (m + n + 1)/2 - i; // (m + n + 1)/2 is the index of median
// handle edeg cases:
int maxLeft1 = i == 0 ? Integer.MIN_VALUE : nums1[i - 1];
int maxLeft2 = j == 0 ? Integer.MIN_VALUE : nums2[j - 1];
int minRight1 = i == m ? Integer.MAX_VALUE : nums1[i];
int minRight2 = j == n ? Integer.MAX_VALUE : nums2[j];
// Corner case: i == 0 or m, j == 0 or n
if (maxLeft1 > minRight2)
right = i - 1;
else if (maxLeft2 > minRight1)
left = i + 1;
else { // found: nums1[i - 1] <= nums2[j], nums2[i - 1] <= nums1[i]
if ((m + n)%2 == 0) // even number
return (double) (Math.max(maxLeft1, maxLeft2) + Math.min(minRight1, minRight2))/2;
else
return (double) Math.max(maxLeft1, maxLeft2);
}
}
return Double.NaN;
}
}
class Solution:
def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
if (not nums1 or len(nums1) == 0) and (not nums2 or len(nums2) == 0):
return float("nan")
if len(nums1) > len(nums2):
return self.findMedianSortedArrays(nums2, nums1)
m, n = len(nums1), len(nums2)
left, right = 0, m
while left <= right:
i = (left + right) // 2
j = (m + n + 1) // 2 - i # (m + n + 1) / 2 is the index of median
# Handle edge case
if i == 0:
max_left_1 = float('-inf') # (1)
else:
max_left_1 = nums1[i - 1]
if j == 0:
max_left_2 = float('-inf')
else:
max_left_2 = nums2[j - 1]
if i == m:
min_right_1 = float('inf') # (2)
else:
min_right_1 = nums1[i]
if j == n:
min_right_2 = float('inf')
else:
min_right_2 = nums2[j]
if max_left_1 > min_right_2:
right = i - 1
elif max_left_2 > min_right_1:
left = i + 1
else:
if (m + n) % 2 == 0:
return (max(max_left_1, max_left_2) + min(min_right_1, min_right_2)) / 2
else:
return max(max_left_1, max_left_2)
return float("nan")
- Python has no integer min.
- Python has no integer max.
Complexity Analysis¶
- Time complexity: \(O(\log (\min(m, n)))\)
Since we only binary search the array with smaller the size, the time complexity is \(O(\log (\min(m, n)))\) - Space complexity: \(O(1)\) as using several internal variables.
Complexity Summary¶
Time Complexity | Space Complexity | |
---|---|---|
Approach - Two pointers | \(O(m + n)\) | \(O(1)\) |
Approach - Find kth smallest | \(O(\log (m + n))\) | \(O(1)\) |
Approach - Binary search with partition | \(O(\log (\min(m, n)))\) | \(O(1)\) |
Test¶
- Test corner cases
- Test general cases: total number is odd and even