Big-O Notation

Runtime of a Program

Generally, computers are used to process a large amount of data. When we run a program on large amounts of input, we must be certain that the program terminates within a reasonable amount of time. Although the amount of running time is somewhat dependent on the programming language we use, and to a smaller extent the methodology we use (such as procedural versus object-oriented), often those factors are unchangeable constants of the design. Even so, the running time is most strongly correlated with the choice of algorithms.  An algorithm is a clearly specified set of instructions the computer will follow to solve a problem. Once an algorithm is given for a problem and determined to be correct, the next step is to determine the amount of resources, such as time and space that the algorithm will require. This step is called algorithm analysis. An algorithm that requires several gigabytes of main memory is not useful for most current machines, even if it is completely correct.

 

Runtime of a program depends on many factors such as operating system, compiler, language, processor, available memory, and data size and network latency among many other things. In this chapter we attempt to give a metric to define the complexity of a program in terms of the data size n. This measurement is independent of all other environmental variables such as processor, memory etc. Recall that an algorithm is a step by step way of solving a problem. Each step of the algorithm requires some level of processing and some steps depend on the data size while others don’t. For example, a loop such as

for (int i=o; i<n; i++)  do_something;

is a function of the data size n. But a statement such as int x = y + z ;  is not a function of the data size and therefore we assume that it takes some constant time to execute. For example, finding the max of a sorted array of size n can be done in constant time as we can find the first or the last element depending on which order (increasing or decreasing) array is sorted. Our attempt here is to describe the complexity of an algorithm in terms of the data size n and give a measure as a function of n. Some of the known functions of n  

Figure Courtesy of Weiss Data Structures and Algorithms

 

Now we give a definition to formalize this process.

Definition: A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation f(n) = O(g(n)) means it is less than some constant multiple of g(n). The notation is read, "f of n is big oh of g of n".

Formal Definition: f(n) = O(g(n)) means there are positive constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k. The values of c and k must be fixed for the function f and must not depend on n.
graph showing relation between a function, f, and the limit function, g

Figure Courtesy of NIST

In our analysis of algorithms we assume a uni-processor model of computation and the existence of a RAM.

Let us consider some known algorithms to get a measure of the runtime. Consider Bubble Sort Algorithm.

public void bubbleSort(){

  for (int i=0; i<size-1; i++)

      for (int j=0; j<size-i-1; j++)

            if (A[j]>A[j+1]){

                  int tmp = A[j];

                  A[j] = A[j+1];

                  A[j+1]= tmp;

            }

      }

When we run this code using a standard java compiler we will begin to notice that the time it takes to run the code is approximately proportional to square of the data size. A closer analysis of the above code shows that outer loop runs (n-1) times and inner loop runs (n-i-1) for each i from 0 to (n-1). This leads to the sum of

(n-1) + (n-2) + … + 1  which can be shown to equal to n(n-1)/2, a function of n2.  Based on the definition of Big(O) this is of O(n2) as  n(n-1)/2 is bounded by cn2 (where c is a constant) for large n.

Each algorithm can be similarly analyzed in a mathematical sense and some of the most common complexities are

 

O(1) - not proportional to any variable number, i.e. a fixed/constant amount of time

           Algorithm does not depend on the size of the data set.

 

O(N) - proportional to the size of N, algorithm run time increases linearly with n.

 

O(N2) - proportional to N2 squared. As you double the size of the data, the program takes 4 times as much to run.

 

O(log N) – These are good algorithms. Typically any loop that doubles (or half) as it goes to the next index, lead to an algorithm of log N complexity.

 

O(N log N) Good comparison based sorting algorithms fall into this category. This is a lot better than O(N2)

 

O(N!) Exponential algorithms. Avoid them at any cost (unless your data size is very small)

 

We can order algorithms based on Big(O) analysis. The smaller Big(O) leads to faster execution.

 

O(1) < O(log N) < O(N) < O(N log N) < O(N2) < O(N!)