Lecture 6: Binary Search Triangle: Divide and Conquer, Big O Binary Search Binary search + coding, Big O Testing & timing Quality of code: Correctness (code, specification) Efficiency (time, space) Style (clarity) Robustness Binary Search General Idea: Sorted Array Find mid. A[mid] == x: done A[mid] > x: look left A[mid] < x: look right What else do we need: index of lowest element in range index of highest element in range (or should we store index of element after highest element in range) More convenient to store index one past last element in range So segment is A[lower, upper) (not C0 notation) First element: index 0 Index one higher than last element: index n (index \length(A)) mid = lower + (upper-lower)/2; look left: upper = mid, lower stays the same (is it ok to set upper to mid, since we exclude that element in range? yes since we already compared it) look right: lower = mid+1, upper stays the same How do we start the search? lower = 0 upper = \length(A) Done: interval size 1 after testing OR When do we continue: lower != upper-1 ? Let's see what invariants show as we write code int binsearch(int x, int[] A, int n) //@requires 0 <= n && n <= \length(A); //@requires is_sored(A, n); /*@ensures (-1 == \result && !is_in(x, A, n)) || ((0 <= \result && \result < n) && A[\result == x); @*/ { int lower = 0; int upper = n; int mid = 0; while (lower != upper) //@loop_invariant 0 <= lower && lower < upper && upper <= n; //WRONG //@loop_invariant lower == 0 || A[lower-1] < x; //@loop_invariant upper == n || A[upper] > x; { int mid = lower + (upper-lower)/2; //@assert lower <= mid && mid < upper; if (A[mid == x) return mid; else if A[mid] > x) upper = mid; else //@assert A[mid] < x; { lower = mid+1; } } return -1; } //@loop_invariant A[lower] <= A[mid] && A[mid] <= A[upper-1] doesn't tell us much more since we know array is sorted //@loop_invariant A[lower] <= x && x <= A[upper-1] if we're searching for value outside of data range, false 2nd and 3rd loop invariants above say where x is NOT, rather than where x is. What about just A[lower] <= x? Could be false initially. We haven't done any comparisons so we don't know... A[lower] could be greater than x. Likewise just A[upper-1] >= x is not a valid invariant. Trace: does this work for an array of size 1? Yes In this case, loop does execute, and mid = lower. If A[0] != x, then we either set upper = mid (which is lower) so loop ends or we set lower = mid+1 (which is upper) so loop ends. Either way we return -1. Is loop invariant satisfied? lower can be equal to upper. //@loop_invariant 0 <= lower && lower <= upper && upper <= n; change loop condition to (lower < upper) Remember: if upper == lower, then we have 0 elements left to examine, since upper points to the element AFTER the last element in range Array of size 0? works ok int binsearch(int x, int[] A, int n) //@requires 0 <= n && n <= \length(A); //@requires is_sored(A, n); /*@ensures (-1 == \result && !is_in(x, A, n)) || ((0 <= \result && \result < n) && A[\result == x); @*/ { int lower = 0; int upper = n; int mid = 0; while (lower < upper) //@loop_invariant 0 <= lower && lower <= upper && upper <= n; //@loop_invariant lower == 0 || A[lower-1] < x; //@loop_invariant upper == n || A[upper] > x; { int mid = lower + (upper-lower)/2; //@assert lower <= mid && mid < upper; if (A[mid == x) return mid; else if A[mid] > x) upper = mid; else //@assert A[mid] < x; lower = mid+1; } return -1; } [Tom: No brackets needed for last else?] Can we add special checks to see if x < A[0] or x > A[m-1] to exit early? Sure but you're not saving a lot of operations since this is logarithmic already so at most ~30 operations. Test runs We can test linsearch and binsearch with same tester (with function name change) since they satisfy the same contract. Compile without -d and it runs very fast. Because with -d we check contracts which include linear search. UNIX command: time ./a.out Big O When it's compiled, we don't what's happening in compiler and what processor we're on... etc. 1. We are most concerned with functions run on LARGE inputs. For very small-sized inputs, your functions will run very fast no matter what. As we increase the size of the input, can we determine how the time will grow? 2. We don't care about constant factors. Linear Search: c * n times Binary Search: c' * log(n) times The constants may vary depending on how many computations we do in each iteration and what the compiler does to compile and optimize your code, etc. There is a constant c and some n0 such that for all n > some n0, g(n) <= c*f(n). (these are running times, not returned values) (not for all n since it violates first principle) We say O(f) : class of functions such that this condition holds (above) g in O(f) (in literature, g = O(f)) O(1) constant time (e.g. access ith element) O(log n) logarithmic O(n) linear To test Big O: double the size of the input, see how running time changes. For a linear algorithm, it should double (approximately) (shows timing interactively) for logarithmic: log (2n) = log n + log 2 = log n + 1 (so 1 extra operation) so very little time increase