[<--] Return to the list of AI and ANN lectures

This is only a rough outline of the material that was covered in this lecture.

Hopfield CAM Review and Example

We began with a review of the Hopfield Content Addressable Memory from the last lecture, illustrating the techique by applying it to problem 2 on the sheet of optional homework problems. The net has only 3 neurons, and is used to store a single pattern (110). By applying the prescription given for determining the weights in terms of the stored patterns, we were able to find the weights, and thus the relation between the inputs to each of the three neurons and the outputs of the other two. (Part a). Then, we applied the iterative procedure for updating the values of V1, V2, and V3, starting with the initial pattern (111), showing how the net converged to the correct output (110). (Part b). Then we calculated the "Energy" for the intial case and the final case, showing that, indeed, it converged to the state of lower energy to get the final result.

The Hopfield-Tank Network

The main topic covered was the Hopfield-Tank network. This is described in the assigned reading, the paper: J. Hopfield and D. W. Tank, ``Computing with Neural Circuits: A Model'', Science 233, p. 625 (8 Aug 1986).

In this paper, Hopfield and Tank proposed a method of solving difficult optimization problems that had conflicting constraints. The technique involved bringing the network to a final output state that minimized a "cost" function or "energy" that is of the same form as the energy of the Ising model spin glass that we discussed in the previous lecture.

Spin glass:

$\begin{displaymath}E = -1/2 \sum_{i}\sum_{j}J_{ij}S_iS_j - \sum_{i}h_iS_i\end{displaymath}$

with S_i = +1 or -1.

Hopfield-Tank network:

$\begin{displaymath}E = -1/2 \sum_{i}\sum_{j}W_{ij}V_iV_j - \sum_{i}I_iV_i\end{displaymath}$

with V_i tending towards 0 or 1.

This is like the energy term for a Hopfield CAM, but has an extra term involving the quantity I_i that represents external inputs, bias terms, or a combination of both. Like the CAM, every neuron is connected to every other with symmetric weights, W_ij = W_ji, and with W_ii = 0. With slightly different notation, this is Eq. (2) in the Hopfield-Tank paper.

One difference from the Hopfield CAM, is that the Hopfield-Tank network has continous (analog) values of V_i that are forced toward 0 or 1 by a sigmoidal activation curve. Thus, when the total input to a neuron is large and positive, the output tends towards 1, and it tends towards 0 when the total input is a large negative number. It is this continous value of the V_i's that allows the net to avoid local minima by slowly varying the state of the outputs while converging to a final state.

As with the CAM, for the right kinds of problems, the weights of the Hopfield-Tank network can be determined directly, without the need of training cycles. The idea is to embody the various constraints of the problem into a cost function E of the form given above. The weights W_ij and input terms I_i are chosen so that E will be a minimum when the constraints imposed on the outputs are obeyed, and we get the outputs that we want. Like a spin glass, it will be a frustrated system - some constraints will conflict with others, so we will try to get the most optimum energy minimization that we can under the circumstances.

Different external inputs will give different values of the I_i, shifting the shape of the energy surface. Unlike the CAM, in which the desired output state is the local minimum corresponding to the stored pattern that is closest to the intial state, here we want a global minumum. The W_ij and I_i have been chosen so that for a given set of I_i (corresponding to a particular input), the energy is a minimum when the net has the desired output for that input.

The network can be implemented either by building an electrical circuit (using operational amplifiers, resistors and capacitors), or by solving a set of differential equations.

A guided tour of the Hopfield-Tank D/A converter example

The 4-bit analog to digital converter was used in the Hopfield-Tank paper to illustrate this technique. Four "neurons" are reciprocally connected to each other, and to an input source of analog voltage, X. Here, Here 0 <= X < 16, and we want the four V_i's to settle down to binary values that represent the closest integer representation of X. i.e., an approximation to X = 8V₃ + 4V₂ + 2V₂ + V₀.

We do this by constructing a cost function that:

has the form of the general energy equation above -- Eq. (2) in the paper
has a term proportional to the square of the difference between X and the output 8V₃ + 4V₂ + 2V₂ + V₀. This will be lowest when the output is closest to the desired state for a given input X.
has a term that is lowest when the V_i's are either 0 or 1, rather than some intermediate values. This term also cancels out any terms of the form V_i² that are present in the squared error term, because they are not allowed in the general expression for the energy.

The resulting expression is:

$\begin{displaymath}E = 1/2 \left(X - \sum_{i=0}^32^iV_i\right)^2 + \sum_{i=0}^32^{2i-1}V_i(1-V_i)\end{displaymath}$

This is like Eq. (3) in the paper, with one important difference. It should be obvious to you that the negative sign in front of the first term of the equation in the paper is a mistake!

Then, we can find the weights W_ij and the parameters I_i by inspection. It is just a matter of comparing the general energy expression in Eq. (2) with the specific form in Eq. (3), and finding the values of W_ij that correspond to the coefficients of V_iV_j, and the values of I_i that correspond to the coefficients of V_i. For a given X, the two energy expressions should be equal (except for an additive constant). The results were given in Eq. (4) of the paper:

$\begin{displaymath}W_{ij} = -2^{i+j} \mbox{ for } i \neq j, W_{ij} = 0 \mbox{ for } i = j, \mbox{ and } I_i = -2^{2i-1} + 2^iX\end{displaymath}$

Exercise: Derive this result.

Hopfield and Tank also described an application of this method to the Traveling Salesman Problem. The TSP is a example of a class of difficult optimization problems with conflicting constraints. The object is to choose an itinerary that visits each city exactly once, and that also minimizes the distance traveled. This was done in a similar manner by constructing a cost function that penalizes long paths and those that skip a city or visit a city more than once.

The difficulty with binary-valued outputs

It would be tempting to use strictly binary values for the states V_i of a Hopfield-Tank network. This would seem to be a natural representation for a problem like the A/D converter or others that have a result that is represented by binary values. But, unlike the case of the Hopfield CAM, we need to find the global miniumum of E, not just the nearest local minimum. This presents difficulties for a binary valued Hopfield network, which gives very poor solutions to this kind of problem. We can see this by looking at some plots of energy surfaces. These are easier to visualize if we only have two-dimensions for the inputs and a third for the energy, so the figure here is for a two-bit A/D converter, with the four cases of an input of 0, 1, 2, or 3. The four corners of a surface represent the possible output states of V₁V₀ = 00, 01, 10, and 11.

For an input voltage of 2, the output state (10) has the lowest energy. If we were to start with a state (00), and change V₀ to minimize the energy, we would change it from 0 to 1 and get to the state (01). Then we move on to neuron 1, and see if we should change V₁ from 0 to 1. Unfortunately, the state (11) has a higher energy than the state (01), so we leave V₁ at 0. When we go back to neuron 0, we find that we cannot reduce the energy any further by changing the value of V₀. If we were to change it back to 0, the energy would increase, so we are stuck in the local minimum corresponding to the state (01).

The problem with our procedure is that we are using a "greedy algorithm". We always move in the direction of minimizing the energy, without ever allowing the energy to increase, so that we might later get to a lower minimum. The continuous-valued version of the Hopfield-Tank network avoids this problem by allowing small changes in V₀ and V₁ that would let us follow a gradual descent from the state (00) to the state (10). If we like the idea of using binary valued neurons instead of ones with analog values, then we need to find another approach.

Simulated Annealing

One very powerful technique is called Simulated Annealing. The idea behind simulated annealing is that instead of being greedy and trying to minimize E by always making changes in the V_i's that minimize E, we find a set of V_i's that is representative of the system at some high temperature, and gradually lower the temperature toward T = 0, where we expect E to be a minimum. This approach is modeled on the process of annealing glass or metals by allowing them to cool slowly after being heated. At high temperatures, the thermal energy allows the atoms (or neurons, in the case of a neural net) to get out of the local minima in energy, so that they can then fall into lower minima. This is possible because at high temperatures, there can be large thermal fluctuations in the energy, which will allow the system to have some probability of increasing its energy as well as decreasing it. As the temperature is lowered, fluctuations in energy become smaller, and it becomes more difficult to leave these lower energy states.

In the case of cooling a molten glass or metal, the temperature T is related to the kinetic energy of motion of the molecules. In a magnetic spin system, T can be related to the distribution of S_i's between high and low energy states. (If you have studied statistical thermodynamics, you may recall the Boltzmann distribution, in which the probablility of having an energy E is related to the temperature T.)

The Metropolis Monte Carlo method is an algorithm that can be applied to simulate an Ising model spin glass in thermal equilibrium at a temperature T. It has been used extensively by physicists to study Ising-type magnetic spin systems. Kirkpatrick, Gelatt and Vecchi recognized that the Metropolis algorithm can also be used for the Hopfield-Tank network, or for any optimization problem in which the constraints can be incorporated into a cost function that has the same form as the energy of a spin glass. The steps in the algorithm were given in the handout, but not discussed in class. From the details of the algorithm, it should be apparant that small increases in E are more probable than large ones, and that these increases become less probable as T becomes smaller. The Monte Carlo method is optional material and won't be covered in the exam.

However, we did discuss the application of this method to find a global minimum in the "energy" (or "cost function") of a Hopfield-Tank network. The steps are given in the last part of the handout, "Finding a Global Minimum by Simulated Annealing". When applied to binary valued neural networks or any other system with binary states that has a cost, or "energy" function, the Monte Carlo method is used to bring the system to a state characterized by a temperature T. Then the temperature is gradually lowered, as described above.

For further information about the Metropolis Monte Carlo Method and Simulated Annealing, see:

An overview of Simulated Annealing at http://www.cs.sandia.gov/opt/survey/sa.html
Kirkpatrick, S., C. D. Gelatt Jr., M. P. Vecchi, "Optimization by Simulated Annealing", Science, 220, 4598, 671-680, 1983.
Metropolis,N., A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, "Equation of State Calculations by Fast Computing Machines", J. Chem. Phys.,21, 6, 1087-1092, 1953.
Press, Wm. H., B. Flannery, S. Teukolsky, Wm. Vettering, Numerical Recipes, 326-334, Cambridge University Press, New York, NY, 1986.

[<--] Return to the list of AI and ANN lectures

Dave Beeman, University of Colorado
dbeeman "at" dogstar "dot" colorado "dot" edu
Thu Nov 8 15:55:44 MST 2001