[<--] Return to the list of AI and ANN lectures

Spin Glasses and the Hopfield Content Addressable Memory

Introduction (outline)

Reference: J. J. Hopfield, ``Neural networks and physical systems with emergent collective computational abilities'', Proc. Natl. Acad. Sci. USA 79, 2554 (1982)

In this, and the next lecture, we will study a type of neural network that was inspired by analogies with the physics of magnetism.

Overview of Hopfield networks

Characteristics - a recurrent network with total connectivity and a symmetric weight matrix; binary valued outputs.
Advantages - simple prescription for the weights, with no training needed; output settles down to a steady state.

Relation to the physics of magnetic spin systems

Phase transitions represent a competition between minimizing the energy (usually producing an ordered state) and maximizing the entropy (increasing the disorder). This can be understood in terms of the behavior to the free energy, F = E - TS, which tends to a minimum for a system in thermal equilibrium. At low values of the temperature T, is is more important to minimize the energy E. When T is high, then large values of the entropy S will make F smaller. The most interesting (and difficult to explain) aspect is the abrupt, discontinous nature of the transition between the ordered state and the disordered state.
```
    o   o   o   o   o
                          Low T, E --> minimum
    o   o   o   o   o
                          crystalline solid
    o   o   o   o   o


    o   o      o   o
      o      o    o       High T, S --> maximum
      o   o       o
    o  o     o o    o     liquid state
     o    o     o o
```
We discussed "emergent collective behavior"
The liquid-solid transition is a familiar example, but is complicated because the behavior depends on the values of the positions and momenta of each of the molecules.

When the system of interest is too complicated to deal with mathematically, it is common to look for a simpler system (or even an idealized model system) that has similar behavior. The ferromagnetic phase transition is interesting because magnetic systems can be described with fewer variables (the direction of the magnetic moments of the atoms, which assume discrete quantized values). This make their analysis somewhat simpler, although still difficult. At low temperatures a pure crystal of iron in a very weak magnetic field will be completely magnetized, with the magnetic moment of each atom pointing in the same direction. At about 1043 deg. K, the magnetization abruptly vanishes.

The key to being able to quantitatively describe the free energy and related thermodynamic quantities is to express the total energy of the system in terms of the states of the atoms.

The magnetic moment of a magnetic atom (Fe, Ni, Co, for example) is proportional to its quantized spin angular momentum, S. The energy of a collection of magnetic atoms can be written as

Here, the vector S_i is the spin of the ith atom, and h_i is proportional to the local magnetic field at the site of the ith atom. Often, it is due to an externally applied magnetic field and is the same for every atom. The dot product in the first term means that if J_ij is positive, then the energy will be a minimum (most negative) if S_i and S_j point in the same direction. The second term minimizes the energy when S_i is in the same direction as the local magnetic field.

J_ij represents the "exchange coupling" between atom i and atom j. This is a quantum mechanical effect involving the overlap of electron wave functions. It depends on the distance between the two atoms, and becomes weaker with distance, and also alternates in sign. A plot of J_ij as a function of the distance between the ith and jth atoms looks roughly like an exponentially damped cosine (although it isn't described by such a simple function).

Types of ordered magnetic systems that undergo a phase transition:

Ferromagnet - nearest neighbors have J > 0, so E is a minimum when all the spins point in the same direction.
Antiferromagnet - nearest neighbors are a little further apart, and have J < 0, so E is a minimum when neighboring spins are anti-parallel.
Spin glass - a dilute alloy of a magnetic atom (iron) in a non-magnetic medium (gold). The J's are effectively random in magnitude and sign, because of the different distances between pairs of magnetic atoms.
- conflicting constraints - in order to minimize the energy of interaction with one neighbor, a spin may have to be in a state that increases the energy of interaction with another neighbor.
- frustration

The Ising model is a simplified model, in which the the spins point either "up" (S = 1), or down (S = -1). The Ising model version of a spin glass would have the energy

$\begin{displaymath}E = -1/2 \sum_{i}\sum_{j}J_{ij}S_iS_j - \sum_{i}h_iS_i\end{displaymath}$

usually with the assumption that the interactions run over all pairs of spins, and the exchange couplings J_ij have a Gaussian distribution about zero. An approximate solution describing the spin glass phase transition was found by David Sherrington and Scott Kirkpatrick in 1975. Since then, the Ising spin glass has been extensively studied with Monte Carlo computer simulations. To learn more about the history of the Ising model, see the Digression on the Ising Model.

This is the background behind John Hopfield's model of a neural network that acts as a content addressable memory.

The Hopfield Content Addressable Memory

The goal is to design a neural network that can be used to store patterns, and that will give back the proper stored pattern when it is presented with a partial match to the pattern. For example, instead of associating a phone number with a name by some indexing scheme that tells what memory location holds the phone number, the memory directly uses the name to get the number, or vice versa. For this reason, it is called a Content Addressable Memory, or auto-associative memory. What goes on in your mind when you try to associate a name with a face?

The Hopfield CAM model has three assumptions:

1.

Every neuron is connected to every other with a symmetric weight matrix where

i.e. The connection strength between neuron j and neuron i is the same as that between neuron i and neuron j, and a neuron does not connect to itself.

2.

The total input to the ith neuron is

(Note that there is no bias or external input for the original Hopfield model.)

3.

The update rule is:

For each neuron i, in turn, set

Repeat this until there is no further change in the ``energy''

that is minimized by this procedure. This is only guaranteed to converge if the updates are applied asynchronously, i.e. using the new value of V_i right away, instead of updating all of the V's at the end of a pass through the net. This ``energy'' function is an example of a Lyapnov function. This is the term used by mathematicians to describe some function of the variables characterizing the system that tends to a minimum as an update rule is applied. Often it is hard to find such a function, or even to prove that one exists.

Except for the fact that the allowed states of a neuron are 0 and 1 instead of ±1, this is mathematically equivalent to the magnetic spin problem. In this case, nature has given us a Lyapnov function, so we know that this network will tend to a steady state.

There are three questions that we should ask at this point:

1.

Q: If there is no term in the total input that corresponds to an external input, how do we present an input pattern?

A: The pattern is presented by initializing all the V_i to an input pattern. For example, these might correspond to pixel values for a digitized letter, or the binary representation for a name, phone number and social security number. This is typically an incomplete or degraded pattern, and we want to recall the complete stored pattern. We then use the update rule given above to let the network evolve to a steady state. This will hopefully result in the stored complete pattern that is closest to the pattern that is used as the input.

2.

Q: How do we determine the set of weights that are used to store a set of patterns?

A: To store a set of patterns (labeled by the index p), set the weights to

where is the set of desired states for pattern p. The nice feature of this network is that no training cycles are needed. We have an explicit expression for the weights in terms of the desired outputs.

3.

Q: Why does this prescription for setting the weights work?

A: This can be more easily seen if we use state variables that assume values of ±1 instead of 0 or 1. In class, we made the substitution S_i = (2V_i - 1) and found that when the energy is expressed in terms of S_i, the most significant term is of the form

and the weights are given by

Using this expression for the weights, we get

Suppose that we only have a single pattern p to store. Then E is at a minimum (the most negative) when

and the update rule leads us to a final state that is the same as the stored pattern.

If we have more than one pattern, the weights aren't optimum for the retrieval of any one pattern, but represent an average or compromise over the set of patterns. Some frustration develops. Instead of having a single minimum energy state, we will have a local minimum for each pattern. If the initial state of the network isn't too far from the stored state, the system will slide into the nearest local minimum, which will be the desired output state.

This figure represents a contour map with lines of constant energy for different values of the output of two of the neurons. (For N neurons, this would be an N-dimensional plot.) The circles each represent a "basin of attraction", or local minimum, corresponding to a stored pattern. If the initial state is displaced slightly from one of these, the network state will tend toward the stored pattern.

Normally, we think of a local minimum as something to be avoided. When we try to minimize the error in a feedforward net by using the backpropagation algorithm, getting stuck in a local minimum can keep us from finding the global minimum that produces the best set of weights. In this case, local minima are desireble. We don't want the system to settle down to some global minimum in the energy instead of finding the state corresponding to the nearest minimum.

What happens if you try to store too many patterns? What is the capacity of the Hopfield associative memory? Hopfield found in his computer experiments that the ability of the network to retrieve patterns fell dramatically when the number of stored patterns approached 15% of the number of neurons. As the number of patterns becomes large, W_ij begins to look like a random variable. The term in the sum that favors a particular pattern is greatly outweighed by all the others, and the associative memory begins to look like a true spin glass.

People are fond of saying that the Hopfield net is like a spin glass, but the important distinction is that a properly behaving Hopfield net doesn't have a random connectivity matrix. A real spin glass would make a very poor associative memory.

[<--] Return to the list of AI and ANN lectures

Dave Beeman, University of Colorado
dbeeman "at" dogstar "dot" colorado "dot" edu
Thu Nov 8 15:55:44 MST 2001