15-859B: Assignment 2: Multidimensional Optimization

Due: Tue, 24 Oct, 5pm at my office (Newell-Simon 4205), or in class earlier that day.

Summary

Implement and test multidimensional optimization on two or higher dimensional problems. You can implement in the language of your choosing, but Matlab is recommended because that will simplify the graphics. You must write the optimization code yourself. We will compare the results between students on a specified minimization problem to see whose algorithm and implementation found the global minimum in the fewest floating point operations. The winner will earn the title of Champion Optimizer!

The assignment consists of two parts. In Part 1 you do an assigned task; in Part 2 you have several choices.

Part 1

For part 1 of the assignment, you will be randomly assigned an algorithm to implement; one of: Nelder and Mead, Steepest Descent, Newton, BFGS, or Conjugate Gradient. During lecture, we'll "draw out of a hat" (I would like each algorithm to be implemented by at least two people). If you missed that lecture, then email me, or if you don't have time, pick one of these five at random. If you find the description of the algorithm in Heath to be inadequate, then I suggest you consult the original paper on the method, or another book. See Heath's section 6.7 for references.

Use that algorithm to minimize the function of 2 variables given by the Matlab code hump1.m. Note that this code uses random numbers, but that the random number generator is seeded with a fixed value, so the "random" numbers are in fact deterministic.

If you don't implement your optimizer in Matlab, then read this paragraph. Run Matlab to print out the parameters of this function to high precision, then implement exactly the same function in your language. To print the parameters, insert the statement format long g; cen, amp, sigma, rot into hump1.m and run it once. Use double precision floating point, not single precision. Test a few points to verify that your function is equal to hump1.

Find the global minimum. Do this (heuristically and simple-mindedly) by doing 1000 searches, each consisting of:

Pick a uniformly distributed random starting point in [0,1]x[0,1]. If other starting points are needed (as with Nelder-Mead), choose them nearby, say within a distance of .1 .
Iterate your algorithm until convergence or divergence. For consistency, let's define convergence as two consecutive steps moving the current point an xy distance of less than eps=1e-5, and divergence as the current point having an xy distance from the origin of big=10 or more. The "steps" I speak of here are the outer loop steps; if your method has an inner loop for a line search, make up your own convergence and divergence criteria for those steps.
Add any more termination criteria you find necessary, e.g. to detect a limit cycle (hopefully these will not occur).

Take, as your (approximate) global minimum, the lowest converged point.

Don't code-optimize the hump1 function (it's tempting, since my Matlab implementation does the same "random" initialization each time) since doing so will bias your results relative to others' and make comparison more difficult, unless you can figure out a way to compensate for that in your statistics. If you need gradients or Hessians of this function, as several of the algorithms do, you can work those out symbolically. If you write the hump function as sum{ a[i]*exp{-.5*(x-c[i])'*R'*D[i]*R*(x-c[i])} } then from this you can derive symbolic formulas for gradient and Hessian and implement those in your choice of language.

In your implementation, compute the following statistics:

Average number of floating point operations per search, averaged over your 1000 searches.
Recall that in Matlab, "flops(0)" resets the flop counter, and "flops" returns the flops used so far. If you don't use Matlab, count a scalar floating point add, subtract, multiply, or divide as 1 flop, integer arithmetic and sin, cos, and exp as 0 flops (to be consistent with Matlab). If gross inconsistencies emerge from different languages and compilers, we'll see if we can normalize for that after the results are collected.
Average number of function evaluations per search. Also, if applicable, average number of gradient evaluations and Hessian evaluations per search.
Fraction of the 1000 searches that converged to the global minimum (within an xy distance of 10*eps, say).

Pick two of your search paths, one of which converged to the global minimum, and another that went somewhere else, and draw them (by computer) on a contour plot in the manner of Heath's figure 6.5. Put small dots at each function evaluation point. Code to efficiently evaluate the hump1 function for plotting is in plothump.m, which you can run, from Matlab, with the command plothump(3,4). The arguments set the number of humps and the random seed. This command takes a few seconds because it uses a fine grid to get accurate contours.

Adding lines on top of this plot is easy in Matlab. For example, suppose your search iterations are (0,0), (.2,.04), (.4,.16). These can be graphed with:

xv = [0 .2 .4];		% x vector
yv = [0 .04 .16];	% y vector
line(xv, yv, 'Color','r', 'LineWidth',1, 'Marker','o', 'MarkerSize',2);

This draws two line segments in red and three little circles each 5 points in size. You can draw it on top of the contour plot. For more info, see the huge but very readable document "Using Matlab Graphics" among Mathworks' online manuals.

For part 1, turn in: a discussion of two paragraphs to one page summarizing what you did, what choices you made, and what successes/failures you encountered; the above statistics; and the above plot(s); and a code listing.

Part 2

Choose a different optimization algorithm (from Heath, or some other source), and implement it.

For what to optimize, choose between the following:

2A. Optimize the same "hump1" function, and turn in the same kind of information as in part 1. Also add a paragraph of discussion comparing the results of part 1 and part 2. Which algorithm did better and why?
2B. Take some other nontrivial function of two or more variables and optimize it (find the global maximum or minimum within some domain). Nontrivial means it can't be optimized in closed form, as, for example, a quadratic function can. The function you optimize could be a function of interest in your own research, e.g. nonlinear least squares problem, image registration from computer vision, or a design problem from engineering. If you want to do constrained optimization, be my guest. Turn in similar information to part 1. If your search domain has more than two dimensions, you could project the search path to 2-D to visualize it.

15-859B, Introduction to Scientific Computing
Paul Heckbert
9 Oct. 2000.
Minor revisions 10 Oct.