Intro

Joke: https://labuladong.gitbook.io/algo-en/iii.-algorithmic-thinking/detailedbinarysearch

gotcha in the above joke is that Binary Search is only applicable when you’re absolutely certain that the target element doesn’t lie in the half being discarded. Works for sorted numbers, but not for a pile of books :D

Templates

Binary Search impl can be extremely tricky because of checking and updating search space bounds. Ref

mid = floor(low+high)/2;

mid = floor(low+(high-low))/2;   // avoids overflow

Two templates of writing BS (they differ by interval range [] vs [)):

  • TEMPLATE#1 - the “closed interval” binary search ([low, high]), the classic textbook binary search: ```cpp int low = 0, high = arr.size() - 1; // this makes it closed interval

while(low <= high){ int mid = low + (high - low) / 2;

if (k == arr[mid]) return mid; else if (k > arr[mid]) low = mid + 1; else high = mid - 1; }

// at this point in code, low = high + 1 // if method didn’t return till now ofc

return -1;


- **TEMPLATE#2** - the "half-open interval" binary search (`[low, high)`), C++ STL-like impl:
```cpp
int low = 0, high = arr.size();        // this makes it half-open interval

while(low < high){                    // line 1
  int mid = low + (high - low) / 2;

  if (k == arr[mid]) return mid;
  else if (k > arr[mid]) low = mid + 1;
  else high = mid;                    // line 2
}

// at this point in code, low = high (because of line 1)
// if method didn't return till now ofc

return -1;

NOTICE: line 1 and line 2 go hand-in-hand to make sure we move correctly. Whenever in doubt, dry run the case [1 2 3] with k = 1. Half-open interval maybe counter-intuitive at times, so just avoid it completely!

Bounds

Lower Bound: lower bound of x is the smallest index i such that arr[i] >= x. Ex - in [2 4 5], lower bound of 3 is 4 (not 2) and lower bound of 4 is 4 itself. Also, observe that LB of 2 in [1,2,2,3] is 2 at index 1.

Upper Bound: upper bound of x is the smallest index i such that arr[i] > x. Ex - in [2 4 5], upper bound of 3 is 4 and upper bound of 4 is 5. Also, observe that UB of 2 in [1,2,2,3] is 3 at index 3.

Note that LB(x) = UB(x) if element x is not present in the array. Also, there maybe no LB/UB (return -1) if we go out of bounds searching for it.

One interesting observation for LB/UB is that when we break out of the while loop, the lower or upper bound is at index low, no matter whichever template we’re using! (clarification)

  • keep arr[mid] = k condition on the direction we want to move in to skip duplicates (obvious). In LB we move leftwards in duplicates, in UB we move rightwards in duplicates. At the end, low will always end up at the answer.
  • if the LB or UB for a given k does not exist in the array, then after the loop low == arr.size().

Code

Related Problems:

  • https://leetcode.com/problems/find-smallest-letter-greater-than-target
  • Search Insert Position: (link) insert position is LB/UB only, depends on where question wants us to insert if element is already present in array.
  • Problems on counting occurrences of an element: link, link

[!TIP] LB, UB, Floor, Ceil - Striver’s way of storing in ans on every potential candidate is very simple and intuitive than above low pointer approach (TEMPLATE#2).

Floor and Ceil of x: if target is 4, low and high will eventually converge at [2 5] and after two more steps, we’ll have our greatest number less than x at high and lowest number greater than x at low. Take care of equal to cases and duplicates using strategy discussed above (for lower and upper bound)

  • ceil(x) is equal to LB(x), floor(x) can be the number smaller than x here (unlike LB) so its not at low, it’s at low - 1 (if low > 0)
  • if element x is present in array then floor = x = ceil (ofc)

Find the first and last occurrence of a given number in a sorted array:

  • First position: on arr[mid] == k store ans = mid and reduce search space to left array only (high = mid - 1), rest is same as in normal BS. Alternatively, first position is LB if number is present, check LB index to detect presence.
  • Last position: on arr[mid] == k store ans = mid and reduce search space to right array only (low = mid + 1), rest is same as in normal BS. Alternatively, first position is UB’s index - 1 if number is present, check UB index to detect presence.

Count occurrences of an element in a sorted array: count will be rightmost_index - leftmost_index + 1

  • use previous approach to find leftmost and righmost indices (with or without LB/UB)
  • another way but linear TC: find any occurrence of it using BS (if not found return -1), either check idx == -1 that means occurrence is 0.If a valid idx, linearly scan its left half and right half for more occurrences (TC = O(logn) + O(n) => O(n))

1D Arrays

Observations for sorted and then rotated arrays e.g. [4,5,1,2,3]:

  • there will be a single inflection point (aka pivot) e.g. 5 in the above example
  • elements leftwards of the pivot will be in desc order and to the rightwards will be in asc order
  • in the given array (sorted and then rotated i.e. unsorted), if we calc a mid and check edges of halves w.r.t it, then only one half can be sorted and the other has to be unsorted; we can check which is which and move accordingly in problems below. This property holds in all further smaller unsorted halves too.
  • duplicates can mess up edges checking so we need to handle that case separately (if duplicates maybe present)

Search in rotated sorted array (no duplicates): (link) we can’t tell which half to goto only by looking at arr[mid] and target here, we can only do that in sorted arrays (or sorted half). One half will always be sorted in a rotated array! check which one and goto it only if target is in its range, otherwise goto the other half (even if unsorted) i.e. iteratively looking for a sorted half.

Search in rotated sorted array (duplicates present): (link) same as above but with an additional condition where arr[mid] == arr[low] && arr[mid] == arr[high] causing indicision on which side to move to. If this condition is true, then do low++; high--; continue;, otherwise proceed just as in the previous problem.

TEMPLATE#3 - searching on a space (convergence search) as opposed to a value search (for k):

int low = 0, high = n - 1;    // closed interval
while(low < high){
  int mid = low + (high - low) / 2;
  if (condition)
    low = mid + 1;
  else
    high = mid;
}

// this may look like half-open interval template because of the loop condition and high = mid update, but its closed only because high = n - 1
// remember, we are converging to an element by shrinking the search space here unlike before where we skipped the element at mid by updating high = mid - 1
// also, this is the first-true / min-feasible goals template discussed below

[!TIP] Stick to while(low <= high) for value search as it checks when mid = low = high too (i.e. single remaining element). Use while(low < high) for convergence search problems as no such value check is needed and low needs to satisfy convergence property at the end. Although both styles or writing code can be used for convergence as shown in below section (Universal Templates).

Find minimum in rotated sorted array (no duplicates): (link) leftmost element (arr[low] or arr[mid]) in the sorted half will be the lowest

  • check which half is sorted, get minimum so far from it either arr[low] or arr[mid], and go to the other part tracking global min at every step.
  • a more terse way is compare arr[mid] > arr[high] to detect which side is sorted, discard the sorted side, and keep the unsorted side until low reaches the pivot. Avoid comparing arr[low] <= arr[mid] as it won’t cover the case [1,2,3,4,5] where array is sorted but never rotated.
  • if duplicates are present (link): like in [2,2,2,0,2], we need to add condition for arr[mid] == arr[high] then do high-- to remove ambiguity on which side to goto. It doesn’t harm the search space for min element as arr[mid] will still be present in space, we only removed its duplicate.
int findMin(vector<int>& nums) {
    int low = 0, high = nums.size() - 1;    // notice

    while (low < high) {
        int mid = low + (high - low) / 2;

        if (nums[mid] > nums[high])      // important for min element finding
            low = mid + 1;
        else
            high = mid;                // looks half-open, but isn't
    }

    return nums[low];                // min is at low in the end
}

// in short - shrink until one element remains

Find out how many times has an array been rotated: can be found out from index of the min element.

Check if array is sorted and rotated: elements leftwards of the pivot will be in desc order and to the rightwards will be in asc order. After finding the pivot, check sorted property of left half and right half manually (TC = O(n)).

Single element in a sorted array (link): first occurrence is supposed to be at even index and other at odd, but after the single element, it will be vice-versa. Goto mid and if mid % 2 == 0 check mid+1, if mid % 2 != 0 check mid-1. Go in the direction of first single element everytime shrinking search space. Edge case is when array has only 1 element, handled implicitly with condition while(low < high).

Find peak element (link): calc mid, if its a corner (arr[0] or arr[n-1]) then peak is that corner value itself, return it. Else check peaks among arr[mid-1], arr[mid] and arr[mid+1], return if its a peak, else keep moving in the direction of the greater element (as it guarantees at least one peak - either the greater element or an extreme corner eventually). Edge case is when there is just a single element in the array, which is implicitly handled by loop condition while(low < high) and then return low at the end.

BS on Space

Tips to identify when BS is an appropriate solution for searching on an answer space:

  • asked to minimize or maximize some feasible value
  • the feasible value has a fixed but very large answer space
  • feasibility behaves monotonically (e.g. true true false false false) and we can check feasibility at a given point and judge in which dir we should navigate based on result of the check.
  • feasibility check for a given guess can replace binary search comparison conditions to navigate in answer space. This check can also use some other quantity like hours to finish all piles at a given eating speed (mid). Notice that these two are inversely related to each other.

Universal Templates

Min-feasible is exactly equal to LB so both templates work normally for it, wheareas max-feasible is not UB but one index leftwards of UB so we tweak some things. Ex - [1 2 2 2 3].

FIRST-TRUE / MIN-FEASIBLE: return lowest index where check(mid) becomes true. This is nothing but Lower Bound.

# low < high version
low = 0, high = n - 1
while (low < high):
  mid = (low + high) / 2
  if (check(mid)):
    high = mid 
  else:
    low = mid + 1
return low

# low <= high version
low = 0, high = n - 1
while (low <= high):
  mid = (low + high) / 2
    if (check(mid)):
      high = mid - 1
    else:
      low = mid + 1
return low

Uses: find exact target / pivot / unique element / minimum element, koko eating bananas, book allocation, etc.

LAST-TRUE / MAX-FEASIBLE: return highest index where check(mid) is true. This isn’t Upper Bound but an index lower than it.

# low < high version
low = 0, high = n - 1
while (low < high):
  mid = (low + high + 1) / 2    # upper mid
  if (check(mid)):
    low = mid        # notice
  else:
    high = mid - 1
return low

# low <= high version
low = 0, high = n - 1
while (low <= high):
  mid = (low + high) / 2
  if (check(mid)):
    low = mid + 1
  else:
    high = mid - 1
return high          # notice; one index leftwards of UB (which will be at low)

NOTE: mid calc is for upper mid because we will be go in infinite loop in cases like [2, 5] as we’re updating low = mid. So we need to calc mid in the direction we’re moving in i.e. rightwards in this case because we want to find last-true or max-feasible.

Uses: floor(sqrt(x)), max ribbon length, max feasible speed / capacity / threshold, aggressive cows, etc.

First-true / Min-feasible

Feasibility simulation problems such as the following are simple, just use convergence search templates to find minimum or maximum feasible in answer space.

Problems:

Kth Missing Positive Number: (link) for each index i, the number of missing positives up to arr[i] is arr[i] - (i + 1). The K-th missing number is the first point where this missing count reaches k, and at that index the answer is simply i + k, because that’s how many numbers are present in array which are strictly less than arr[i] plus the extra k missing ones.

  • shift k++ each time an element <= k is encountered in array - really smart one liner! (TC = O(n))
  • BS solution: we search based on property arr[i]-(i+1) and check it on every mid and move accordingly, on == condition we have exactly k missing elements in left of mid and our ans lies just below arr[mid] (i.e. ans = i+k), we want LB (smallest index such that arr[i]-(i+1) >= k) so move leftwards, finally return low + k after loop break.

Last-true / Max-feasible

Sqrt of a number (integer): (link) This is basically find max number n which satisfies n*n <= x i.e. floor(sqrt(x)). We can init low = 1, high = x/2, but then we’ll need to handle smaller values if(x < 2) return x.

int low = 0, high = x;
while (low <= high) {
  int mid = low + (high - low) / 2;
  if (mid * mid <= x)
    low = mid + 1;
  else
    high = mid - 1;
}
return high;
  • for double precision: use an epsilon because low may never be truly equal to high, and updates are such that because we don’t know how big a jump to take here (we take 1 in integer sqrt) since its a continuous space search. ```cpp double eps = 1e-6; // upto 6 digits after decimal while( (high - low) > eps ){ double mid = (low + high) / 2.0; if (feasible(mid)) high = mid; else low = mid; } return low; // max-feasible

// return high for min-feasible ```

Epsilon BS problems: Nth Root of a Number using Binary Search, Minimize Max Distance to Gas Station

Max-feasible problems:

  • Aggressive Cows

2 Sorted Arrays - Cutpoints

Find K-th element of two sorted arrays: (link) use BS on smaller array to find how many elements i to take from the smaller array, the remaining k - i elements come from the other array, the correct cutpoint satisfies l1 <= r2 && l2 <= r1, and the answer is max(l1, l2) at the point. TC = O(log min(n, m)).

  • init low = max(0, k - n) because the least we can take from smaller array is either 0 or k - n (if k > n; we must take some elements from smaller array). Similarly high = min(k, m) because the most we can take from smaller array is either k or m (if k > m; we must take all elements of smaller array)
  • mid1 becomes r1 and mid1 - 1 becomes l1, similarily get l2 and r2 as mid2 = k - mid1. Check out-of-bounds mid1 and mid2 and set l1 l2 r1 r2 as INT_MIN or INT_MAX acc (user conditional operator)
  • if l1 > r2 then move leftwards (as we want to decrease l1), else if l2 > r1 then move rightwards (as we want to increase l2)

Median of two sorted arrays: (link) use cutpoints approach above k = (m + n + 1) / 2. Handle median calc in case total elements are even or odd as follows:

  • even - average of max left and min right (middle value of kth and k+1th element as both are middle in this case)
  • odd - max of left (same as kth element)

BS on Matrix

Find the row with max number of 1s:

  • O(m * n) scan but if 1s are a lot lesser than 0s then checking every row from the right end is a better strategy
  • O(n * log m) solution where we use LB search to find leftmost 1 and calc total number of 1s

Search in row and column wise sorted matrix with no overlaps among rows: (link)

  • two levels of BS (O(logn + logm)): use BS to find suitable row where element may lie and then do BS on that particular row to find target element. Applicable only if no overlaps in rows.
  • simulate flattening (O(log (n * m)); asymptotically identical to above). Applicable only if no overlaps in rows.
    • row-wise and column-wise sorted matrix after flattening needn’t be a sorted 1D array! (but the LC problem’s second property eliminates this possibility)
    • low = 0 and high = n*m - 1 so we can perform BS in that space and search target element by calc row = mid / n and col = mid % m for each mid on the way

Search in row and column wise sorted matrix with overlaps among rows: (link)

  • go to every row and do BS for target element if its in range of row elements. TC = O(n * log m).
  • staircase search in O(m + n) time (best solution as it has linear TC and works with both row overlaps and no row overlaps!)

K-th element in row and column wise sorted matrix: (link)

  • consider answer space mat[0][0] to mat[n-1][m-1] and calc a mid, count elements less than mid in each row using UB, and to find “actual” mid that is present in matrix do LB on that space. TC = O(log R * n * log m) where R = mat[n-1][m-1] - mat[0][0].
  • LB since we’re trying to find smallest element (mid) in search space such that it gives us k elements in the matrix and is actually present in the matrix

Median of row-wise sorted matrix: matrix is sorted only row-wise, not column-wise, and r*c is given to be odd

  • find min and max elements of matrix from 0th column and c-1th column respectively - this is our median search space
  • perform BS on this median space: calc mid, and for each row too use BS to find numbers <= mid (i.e. calc upper bound of mid for each row) - Nested Binary Search
  • use property that for an element mid to be median it needs to have r*c/2 elements to the left of itself. If number of elements aren’t sufficient or are excess, move left or right accordingly
  • on equality condition countSmallEqual(mid) == r*c/2, our ans is first element having countSmallEqual(mid) > r*c/2 (upper bound), so we move rightwards, low will eventually converge to UB

Find the Duplicate Number: this can be optimally solved using BS or with Floyd’s cycle detection (notes)