Generally, computers are used to process a large amount of data. When we run a program on large amounts of input, we must be certain that the program terminates within a reasonable amount of time. Although the amount of running time is somewhat dependent on the programming language we use, and to a smaller extent the methodology we use (such as procedural versus object-oriented), often those factors are unchangeable constants of the design. Even so, the running time is most strongly correlated with the choice of algorithms. An algorithm is a clearly specified set of instructions the computer will follow to solve a problem. Once an algorithm is given for a problem and determined to be correct, the next step is to determine the amount of resources, such as time and space that the algorithm will require. This step is called algorithm analysis. An algorithm that requires several gigabytes of main memory is not useful for most current machines, even if it is completely correct.
Runtime of a program depends on many factors such as operating system, compiler, language, processor, available memory, and data size and network latency among many other things. In this chapter we attempt to give a metric to define the complexity of a program in terms of the data size n. This measurement is independent of all other environmental variables such as processor, memory etc. Recall that an algorithm is a step by step way of solving a problem. Each step of the algorithm requires some level of processing and some steps depend on the data size while others don’t. For example, a loop such as
for (int i=o; i<n; i++) do_something;
is a function of the data size n. But a statement such as int x = y + z ; is not a function of the data size and therefore
we assume that it takes some constant time to execute. For example, finding the
max of a sorted array of size n can be done in constant time as we can find the
first or the last element depending on which order (increasing or decreasing)
array is sorted. Our attempt here is to describe the complexity of an algorithm
in terms of the data size n and give a measure as a function of n. Some of the
known functions of n
Figure
Courtesy of Weiss Data Structures and Algorithms
Now we give a definition to formalize this process.
Definition: A theoretical measure of the execution of an algorithm, usually the time or memory needed, given the problem size n, which is usually the number of items. Informally, saying some equation f(n) = O(g(n)) means it is less than some constant multiple of g(n). The notation is read, "f of n is big oh of g of n".
Formal Definition: f(n) = O(g(n)) means there are positive
constants c and k, such that 0 ≤ f(n) ≤ cg(n) for all n ≥ k.
The values of c and k must be fixed for the function f and must not depend on
n.
Figure Courtesy of NIST
In our analysis of algorithms we assume a uni-processor model of computation and the existence of a RAM.
Let us consider some known algorithms to get a measure of the runtime. Consider Bubble Sort Algorithm.
public void bubbleSort(){
for (int i=0; i<size-1; i++)
for (int j=0; j<size-i-1; j++)
if (A[j]>A[j+1]){
int tmp = A[j];
A[j] = A[j+1];
A[j+1]= tmp;
}
}
When we run this code using a standard java compiler we will begin to notice that the time it takes to run the code is approximately proportional to square of the data size. A closer analysis of the above code shows that outer loop runs (n-1) times and inner loop runs (n-i-1) for each i from 0 to (n-1). This leads to the sum of
(n-1) + (n-2) + … + 1 which can be shown to equal to n(n-1)/2, a function of n2. Based on the definition of Big(O) this is of O(n2) as n(n-1)/2 is bounded by cn2 (where c is a constant) for large n.
Each algorithm can be similarly analyzed in a mathematical sense and some of the most common complexities are
O(1) - not proportional to any variable number,
i.e. a fixed/constant amount of time
Algorithm does not depend on the
size of the data set.
O(N) - proportional to the size of N, algorithm run
time increases linearly with n.
O(N2) - proportional to N2 squared. As
you double the size of the data, the program takes 4 times as much to run.
O(log N) – These are good algorithms. Typically any
loop that doubles (or half) as it goes to the next index, lead to an algorithm
of log N complexity.
O(N log N) Good comparison based sorting algorithms fall
into this category. This is a lot better than O(N2)
O(N!) Exponential algorithms. Avoid them at any cost
(unless your data size is very small)
We can
order algorithms based on Big(O) analysis. The smaller Big(O) leads to faster
execution.
O(1) <
O(log N) < O(N) < O(N log N) < O(N2) < O(N!)