Lecture 6: Binary Search

Triangle:
Divide and Conquer, Big O
Binary Search
Binary search + coding, Big O Testing & timing

Quality of code:
Correctness (code, specification)
Efficiency (time, space)
Style (clarity)
Robustness

Binary Search
General Idea:
Sorted Array
Find mid.
A[mid] == x:  done
A[mid] > x:   look left
A[mid] < x:   look right

What else do we need:
index of lowest element in range
index of highest element in range (or should we store index of element
after highest element in range)
More convenient to store index one past last element in range
So segment is A[lower, upper)  (not C0 notation)

First element: index 0
Index one higher than last element:  index n (index \length(A))

mid = lower + (upper-lower)/2;

look left:  upper = mid, lower stays the same
(is it ok to set upper to mid, since we exclude that element in range? yes
since we already compared it)
look right: lower = mid+1, upper stays the same

How do we start the search?
lower = 0
upper = \length(A)

Done: interval size 1 after testing
OR
When do we continue:
lower != upper-1 ? Let's see what invariants show as we write code

int binsearch(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sored(A, n);
/*@ensures (-1 == \result && !is_in(x, A, n))
          || ((0 <= \result && \result < n) && A[\result == x);
 @*/
{
       int lower = 0;
       int upper = n;
       int mid = 0;
       while (lower != upper)
       //@loop_invariant  0 <= lower && lower < upper && upper <= n; //WRONG
       //@loop_invariant lower == 0 || A[lower-1] < x;
       //@loop_invariant upper == n || A[upper] > x;
       {
               int mid = lower + (upper-lower)/2;
               //@assert lower <= mid && mid < upper;
               if (A[mid == x) return mid;
               else if A[mid] > x) upper = mid;
               else
               //@assert A[mid] < x;
               {
                       lower = mid+1;
               }
       }
       return -1;
}

//@loop_invariant A[lower] <= A[mid] && A[mid] <= A[upper-1]
doesn't tell us much more since we know array is sorted

//@loop_invariant A[lower] <= x && x <= A[upper-1]
if we're searching for value outside of data range, false

2nd and 3rd loop invariants above say where x is NOT, rather than where x is.
What about just A[lower] <= x?  Could be false initially. We haven't done
any comparisons so we don't know... A[lower] could be greater than x.
Likewise just A[upper-1] >= x is not a valid invariant.

Trace: does this work for an array of size 1?  Yes
In this case, loop does execute, and  mid = lower.
If A[0] != x, then we either set upper = mid (which is lower) so loop ends
or we set lower = mid+1 (which is upper) so loop ends.
Either way we return -1.
Is loop invariant satisfied? lower can be equal to upper.

//@loop_invariant  0 <= lower && lower <= upper && upper <= n;
change loop condition to (lower < upper)

Remember: if upper == lower, then we have 0 elements left to examine,
since upper points to the element AFTER the last element in range

Array of size 0?  works ok

int binsearch(int x, int[] A, int n)
//@requires 0 <= n && n <= \length(A);
//@requires is_sored(A, n);
/*@ensures (-1 == \result && !is_in(x, A, n))
          || ((0 <= \result && \result < n) && A[\result == x);
 @*/
{
       int lower = 0;
       int upper = n;
       int mid = 0;
       while (lower < upper)
       //@loop_invariant  0 <= lower && lower <= upper && upper <= n;
       //@loop_invariant lower == 0 || A[lower-1] < x;
       //@loop_invariant upper == n || A[upper] > x;
       {
               int mid = lower + (upper-lower)/2;
               //@assert lower <= mid && mid < upper;
               if (A[mid == x) return mid;
               else if A[mid] > x) upper = mid;
               else  //@assert A[mid] < x;
                       lower = mid+1;
       }
       return -1;
}

[Tom: No brackets needed for last else?]

Can we add special checks to see if x < A[0] or x > A[m-1] to exit early?
Sure but you're not saving a lot of operations since this is logarithmic
already so at most ~30 operations.

Test runs

We can test linsearch and binsearch with same tester (with function name
change) since they satisfy the same contract.

Compile without -d and it runs very fast. Because with -d we check contracts
which include linear search.

UNIX command:  time ./a.out

Big O

When it's compiled, we don't what's happening in compiler and what processor
we're on... etc.

1. We are most concerned with functions run on LARGE inputs.
For very small-sized inputs, your functions will run very fast no matter
what.
As we increase the size of the input, can we determine how the time will
grow?

2. We don't care about constant factors.

Linear Search:  c * n times
Binary Search:  c' * log(n) times

The constants may vary depending on how many computations we do in each
iteration and what the compiler does to compile and optimize your code, etc.

There is a constant c and some n0 such that
for all n > some n0, g(n) <= c*f(n).

(these are running times, not returned values)
(not for all n since it violates first principle)

We say O(f) : class of functions such that this condition holds (above)

g in O(f)

(in literature, g = O(f))

O(1)     constant time (e.g. access ith element)
O(log n) logarithmic
O(n)     linear

To test Big O:
double the size of the input, see how running time changes.
For a linear algorithm, it should double (approximately)
(shows timing interactively)

for logarithmic:
log (2n) = log n + log 2 = log n + 1  (so 1 extra operation)
so very little time increase