Intro
Intro
Joke: https://labuladong.gitbook.io/algo-en/iii.-algorithmic-thinking/detailedbinarysearch
gotcha in the above joke is that Binary Search is only applicable when you’re absolutely certain that the target element doesn’t lie in the half being discarded. Works for sorted numbers, but not for a pile of books :D
Templates
Binary Search impl can be extremely tricky because of checking and updating search space bounds. Ref
mid = floor(low+high)/2;
mid = floor(low+(high-low))/2; // avoids overflow
Two templates of writing BS (they differ by interval range [] vs [)):
- TEMPLATE#1 - the “closed interval” binary search (
[low, high]), the classic textbook binary search: ```cpp int low = 0, high = arr.size() - 1; // this makes it closed interval
while(low <= high){ int mid = low + (high - low) / 2;
if (k == arr[mid]) return mid; else if (k > arr[mid]) low = mid + 1; else high = mid - 1; }
// at this point in code, low = high + 1 // if method didn’t return till now ofc
return -1;
- **TEMPLATE#2** - the "half-open interval" binary search (`[low, high)`), C++ STL-like impl:
```cpp
int low = 0, high = arr.size(); // this makes it half-open interval
while(low < high){ // line 1
int mid = low + (high - low) / 2;
if (k == arr[mid]) return mid;
else if (k > arr[mid]) low = mid + 1;
else high = mid; // line 2
}
// at this point in code, low = high (because of line 1)
// if method didn't return till now ofc
return -1;
NOTICE: line 1 and line 2 go hand-in-hand to make sure we move correctly. Whenever in doubt, dry run the case [1 2 3] with k = 1. Half-open interval maybe counter-intuitive at times, so just avoid it completely!
Bounds
Lower Bound: lower bound of x is the smallest index i such that arr[i] >= x. Ex - in [2 4 5], lower bound of 3 is 4 (not 2) and lower bound of 4 is 4 itself. Also, observe that LB of 2 in [1,2,2,3] is 2 at index 1.
Upper Bound: upper bound of x is the smallest index i such that arr[i] > x. Ex - in [2 4 5], upper bound of 3 is 4 and upper bound of 4 is 5. Also, observe that UB of 2 in [1,2,2,3] is 3 at index 3.
Note that LB(x) = UB(x) if element x is not present in the array. Also, there maybe no LB/UB (return -1) if we go out of bounds searching for it.
One interesting observation for LB/UB is that when we break out of the while loop, the lower or upper bound is at index low, no matter whichever template we’re using! (clarification)
- keep
arr[mid] = kcondition on the direction we want to move in to skip duplicates (obvious). In LB we move leftwards in duplicates, in UB we move rightwards in duplicates. At the end,lowwill always end up at the answer. - if the LB or UB for a given
kdoes not exist in the array, then after the looplow == arr.size().
Related Problems:
- https://leetcode.com/problems/find-smallest-letter-greater-than-target
- Search Insert Position: (link) insert position is LB/UB only, depends on where question wants us to insert if element is already present in array.
- Problems on counting occurrences of an element: link, link
[!TIP] LB, UB, Floor, Ceil - Striver’s way of storing in
anson every potential candidate is very simple and intuitive than abovelowpointer approach (TEMPLATE#2).
Floor and Ceil of x: if target is 4, low and high will eventually converge at [2 5] and after two more steps, we’ll have our greatest number less than x at high and lowest number greater than x at low. Take care of equal to cases and duplicates using strategy discussed above (for lower and upper bound)
ceil(x)is equal toLB(x),floor(x)can be the number smaller thanxhere (unlikeLB) so its not atlow, it’s atlow - 1(iflow > 0)- if element
xis present in array thenfloor = x = ceil(ofc)
Find the first and last occurrence of a given number in a sorted array:
- First position: on
arr[mid] == kstoreans = midand reduce search space to left array only (high = mid - 1), rest is same as in normal BS. Alternatively, first position is LB if number is present, check LB index to detect presence. - Last position: on
arr[mid] == kstoreans = midand reduce search space to right array only (low = mid + 1), rest is same as in normal BS. Alternatively, first position is UB’s index - 1 if number is present, check UB index to detect presence.
Count occurrences of an element in a sorted array: count will be rightmost_index - leftmost_index + 1
- use previous approach to find leftmost and righmost indices (with or without LB/UB)
- another way but linear TC: find any occurrence of it using BS (if not found return
-1), either checkidx == -1that means occurrence is0.If a valididx, linearly scan its left half and right half for more occurrences (TC =O(logn) + O(n) => O(n))
1D Arrays
Observations for sorted and then rotated arrays e.g. [4,5,1,2,3]:
- there will be a single inflection point (aka pivot) e.g.
5in the above example - elements leftwards of the pivot will be in desc order and to the rightwards will be in asc order
- in the given array (sorted and then rotated i.e. unsorted), if we calc a mid and check edges of halves w.r.t it, then only one half can be sorted and the other has to be unsorted; we can check which is which and move accordingly in problems below. This property holds in all further smaller unsorted halves too.
- duplicates can mess up edges checking so we need to handle that case separately (if duplicates maybe present)
Search in rotated sorted array (no duplicates): (link) we can’t tell which half to goto only by looking at arr[mid] and target here, we can only do that in sorted arrays (or sorted half). One half will always be sorted in a rotated array! check which one and goto it only if target is in its range, otherwise goto the other half (even if unsorted) i.e. iteratively looking for a sorted half.
Search in rotated sorted array (duplicates present): (link) same as above but with an additional condition where arr[mid] == arr[low] && arr[mid] == arr[high] causing indicision on which side to move to. If this condition is true, then do low++; high--; continue;, otherwise proceed just as in the previous problem.
Convergence Search
TEMPLATE#3 - searching on a space (convergence search) as opposed to a value search (for k):
int low = 0, high = n - 1; // closed interval
while(low < high){
int mid = low + (high - low) / 2;
if (condition)
low = mid + 1;
else
high = mid;
}
// this may look like half-open interval template because of the loop condition and high = mid update, but its closed only because high = n - 1
// remember, we are converging to an element by shrinking the search space here unlike before where we skipped the element at mid by updating high = mid - 1
// also, this is the first-true / min-feasible goals template discussed below
[!TIP] Stick to
while(low <= high)for value search as it checks whenmid = low = hightoo (i.e. single remaining element). Usewhile(low < high)for convergence search problems as no such value check is needed andlowneeds to satisfy convergence property at the end. Although both styles or writing code can be used for convergence as shown in below section (Universal Templates).
Find minimum in rotated sorted array (no duplicates): (link) leftmost element (arr[low] or arr[mid]) in the sorted half will be the lowest
- check which half is sorted, get minimum so far from it either
arr[low]orarr[mid], and go to the other part tracking global min at every step. - a more terse way is compare
arr[mid] > arr[high]to detect which side is sorted, discard the sorted side, and keep the unsorted side untillowreaches the pivot. Avoid comparingarr[low] <= arr[mid]as it won’t cover the case[1,2,3,4,5]where array is sorted but never rotated. - if duplicates are present (link): like in
[2,2,2,0,2], we need to add condition forarr[mid] == arr[high]then dohigh--to remove ambiguity on which side to goto. It doesn’t harm the search space for min element asarr[mid]will still be present in space, we only removed its duplicate.
int findMin(vector<int>& nums) {
int low = 0, high = nums.size() - 1; // notice
while (low < high) {
int mid = low + (high - low) / 2;
if (nums[mid] > nums[high]) // important for min element finding
low = mid + 1;
else
high = mid; // looks half-open, but isn't
}
return nums[low]; // min is at low in the end
}
// in short - shrink until one element remains
Find out how many times has an array been rotated: can be found out from index of the min element.
Check if array is sorted and rotated: elements leftwards of the pivot will be in desc order and to the rightwards will be in asc order. After finding the pivot, check sorted property of left half and right half manually (TC = O(n)).
Single element in a sorted array (link): first occurrence is supposed to be at even index and other at odd, but after the single element, it will be vice-versa. Goto mid and if mid % 2 == 0 check mid+1, if mid % 2 != 0 check mid-1. Go in the direction of first single element everytime shrinking search space. Edge case is when array has only 1 element, handled implicitly with condition while(low < high).
Find peak element (link): calc mid, if its a corner (arr[0] or arr[n-1]) then peak is that corner value itself, return it. Else check peaks among arr[mid-1], arr[mid] and arr[mid+1], return if its a peak, else keep moving in the direction of the greater element (as it guarantees at least one peak - either the greater element or an extreme corner eventually). Edge case is when there is just a single element in the array, which is implicitly handled by loop condition while(low < high) and then return low at the end.
BS on Space
Tips to identify when BS is an appropriate solution for searching on an answer space:
- asked to minimize or maximize some feasible value
- the feasible value has a fixed but very large answer space
- feasibility behaves monotonically (e.g.
true true false false false) and we can check feasibility at a given point and judge in which dir we should navigate based on result of the check. - feasibility check for a given guess can replace binary search comparison conditions to navigate in answer space. This check can also use some other quantity like hours to finish all piles at a given eating speed (
mid). Notice that these two are inversely related to each other.
Universal Templates
Min-feasible is exactly equal to LB so both templates work normally for it, wheareas max-feasible is not UB but one index leftwards of UB so we tweak some things. Ex - [1 2 2 2 3].
FIRST-TRUE / MIN-FEASIBLE: return lowest index where check(mid) becomes true. This is nothing but Lower Bound.
# low < high version
low = 0, high = n - 1
while (low < high):
mid = (low + high) / 2
if (check(mid)):
high = mid
else:
low = mid + 1
return low
# low <= high version
low = 0, high = n - 1
while (low <= high):
mid = (low + high) / 2
if (check(mid)):
high = mid - 1
else:
low = mid + 1
return low
Uses: find exact target / pivot / unique element / minimum element, koko eating bananas, book allocation, etc.
LAST-TRUE / MAX-FEASIBLE: return highest index where check(mid) is true. This isn’t Upper Bound but an index lower than it.
# low < high version
low = 0, high = n - 1
while (low < high):
mid = (low + high + 1) / 2 # upper mid
if (check(mid)):
low = mid # notice
else:
high = mid - 1
return low
# low <= high version
low = 0, high = n - 1
while (low <= high):
mid = (low + high) / 2
if (check(mid)):
low = mid + 1
else:
high = mid - 1
return high # notice; one index leftwards of UB (which will be at low)
NOTE: mid calc is for upper mid because we will be go in infinite loop in cases like [2, 5] as we’re updating low = mid. So we need to calc mid in the direction we’re moving in i.e. rightwards in this case because we want to find last-true or max-feasible.
Uses: floor(sqrt(x)), max ribbon length, max feasible speed / capacity / threshold, aggressive cows, etc.
First-true / Min-feasible
Feasibility simulation problems such as the following are simple, just use convergence search templates to find minimum or maximum feasible in answer space.
Problems:
- Koko Eating Bananas
- Minimum Number of Days to Make m Bouquets
- Find the Smallest Divisor Given a Threshold: this is exactly Koko Eating Bananas just diff problem statement
- Capacity To Ship Packages Within D Days
- Book Allocation, Split Array Largest Sum, Painter’s Partition: they’re all the exact same problem; minimizing the maximum effort/time any one person has to put in.
- Minimize Max Distance to Gas Station
Kth Missing Positive Number: (link) for each index i, the number of missing positives up to arr[i] is arr[i] - (i + 1). The K-th missing number is the first point where this missing count reaches k, and at that index the answer is simply i + k, because that’s how many numbers are present in array which are strictly less than arr[i] plus the extra k missing ones.
- shift
k++each time an element<= kis encountered in array - really smart one liner! (TC =O(n)) - BS solution: we search based on property
arr[i]-(i+1)and check it on everymidand move accordingly, on==condition we have exactlykmissing elements in left ofmidand our ans lies just belowarr[mid](i.e.ans = i+k), we want LB (smallest index such thatarr[i]-(i+1) >= k) so move leftwards, finally returnlow + kafter loop break.
Last-true / Max-feasible
Sqrt of a number (integer): (link) This is basically find max number n which satisfies n*n <= x i.e. floor(sqrt(x)). We can init low = 1, high = x/2, but then we’ll need to handle smaller values if(x < 2) return x.
int low = 0, high = x;
while (low <= high) {
int mid = low + (high - low) / 2;
if (mid * mid <= x)
low = mid + 1;
else
high = mid - 1;
}
return high;
- for double precision: use an epsilon because
lowmay never be truly equal tohigh, and updates are such that because we don’t know how big a jump to take here (we take1in integer sqrt) since its a continuous space search. ```cpp double eps = 1e-6; // upto 6 digits after decimal while( (high - low) > eps ){ double mid = (low + high) / 2.0; if (feasible(mid)) high = mid; else low = mid; } return low; // max-feasible
// return high for min-feasible ```
Epsilon BS problems: Nth Root of a Number using Binary Search, Minimize Max Distance to Gas Station
Max-feasible problems:
- Aggressive Cows
2 Sorted Arrays - Cutpoints
Find K-th element of two sorted arrays: (link) use BS on smaller array to find how many elements i to take from the smaller array, the remaining k - i elements come from the other array, the correct cutpoint satisfies l1 <= r2 && l2 <= r1, and the answer is max(l1, l2) at the point. TC = O(log min(n, m)).
- init
low = max(0, k - n)because the least we can take from smaller array is either0ork - n(ifk > n; we must take some elements from smaller array). Similarlyhigh = min(k, m)because the most we can take from smaller array is eitherkorm(ifk > m; we must take all elements of smaller array) mid1becomesr1andmid1 - 1becomesl1, similarily getl2andr2asmid2 = k - mid1. Check out-of-boundsmid1andmid2and setl1 l2 r1 r2asINT_MINorINT_MAXacc (user conditional operator)- if
l1 > r2then move leftwards (as we want to decreasel1), else ifl2 > r1then move rightwards (as we want to increasel2)
Median of two sorted arrays: (link) use cutpoints approach above k = (m + n + 1) / 2. Handle median calc in case total elements are even or odd as follows:
- even - average of max left and min right (middle value of
kth andk+1th element as both are middle in this case) - odd - max of left (same as
kth element)
BS on Matrix
Find the row with max number of 1s:
O(m * n)scan but if1s are a lot lesser than0s then checking every row from the right end is a better strategyO(n * log m)solution where we use LB search to find leftmost1and calc total number of1s
Search in row and column wise sorted matrix with no overlaps among rows: (link)
- two levels of BS (
O(logn + logm)): use BS to find suitable row where element may lie and then do BS on that particular row to find target element. Applicable only if no overlaps in rows. - simulate flattening (
O(log (n * m)); asymptotically identical to above). Applicable only if no overlaps in rows.- row-wise and column-wise sorted matrix after flattening needn’t be a sorted 1D array! (but the LC problem’s second property eliminates this possibility)
low = 0andhigh = n*m - 1so we can perform BS in that space and search target element by calcrow = mid / nandcol = mid % mfor eachmidon the way
Search in row and column wise sorted matrix with overlaps among rows: (link)
- go to every row and do BS for target element if its in range of row elements. TC =
O(n * log m). - staircase search in
O(m + n)time (best solution as it has linear TC and works with both row overlaps and no row overlaps!)
K-th element in row and column wise sorted matrix: (link)
- consider answer space
mat[0][0]tomat[n-1][m-1]and calc amid, count elements less thanmidin each row using UB, and to find “actual” mid that is present in matrix do LB on that space. TC =O(log R * n * log m)whereR = mat[n-1][m-1] - mat[0][0]. - LB since we’re trying to find smallest element (
mid) in search space such that it gives uskelements in the matrix and is actually present in the matrix
Median of row-wise sorted matrix: matrix is sorted only row-wise, not column-wise, and r*c is given to be odd
- find min and max elements of matrix from
0th column andc-1th column respectively - this is our median search space - perform BS on this median space: calc
mid, and for each row too use BS to find numbers<= mid(i.e. calc upper bound ofmidfor each row) - Nested Binary Search - use property that for an element
midto be median it needs to haver*c/2elements to the left of itself. If number of elements aren’t sufficient or are excess, move left or right accordingly - on equality condition
countSmallEqual(mid) == r*c/2, our ans is first element havingcountSmallEqual(mid) > r*c/2(upper bound), so we move rightwards,lowwill eventually converge to UB
Find the Duplicate Number: this can be optimally solved using BS or with Floyd’s cycle detection (notes)