[<--] Return to the list of AI and ANN lectures

Some specific models of artificial neural nets

In the last lecture, I gave an overview of the features common to most neural network models. By clicking here, you can see a diagram summarizing the way that the net input u to a neuron is formed from any external inputs, plus the weighted output V from other neurons. This is used to form an output V = f(u), by one of various input/output relationships (step function, sigmoid, etc.). These usually involve a threshold parameter, theta. At the bottom of the figure, there is a typical network, with input units receiving external inputs, hidden units which communicate only with other neurons, and output units whose outputs are visible to the outside world.

Today, we will start our examination of some specific models.

McCullogh-Pitts Model

In 1943 two electrical engineers, Warren McCullogh and Walter Pitts, published the first paper describing what we would call a neural network. Their "neurons" operated under the following assumptions:

They are binary devices (V_i = [0,1])
Each neuron has a fixed threshold, theta
The neuron receives inputs from excitatory synapses, all having identical weights. (However it my receive multiple inputs from the same source, so the excitatory weights are effectively positive integers.)
Inhibitory inputs have an absolute veto power over any excitatory inputs.
At each time step the neurons are simultaneously (synchronously) updated by summing the weighted excitatory inputs and setting the output (V_i) to 1 iff the sum is greater than or equal to the threhold AND if the neuron receives no inhibitory input.

We can summarize these rules with the McCullough-Pitts output rule

and the diagram

Using this scheme we can figure out how to implement any Boolean logic function. As you probably know, with a NOT function and either an OR or an AND, you can build up XOR's, adders, shift registers, and anything you need to perform computation.

We represent the output for various inputs as a truth table, where 0 = FALSE, and 1 = TRUE. You should verify that when W = 1 and theta = 1, we get the truth table for the logical NOT,

        Vin  |  Vout
        -----+------
          1  |   0
          0  |   1

by using this circuit:

With two excitatory inputs V₁ and V₂, and W =1, we can get either an OR or an AND, depending on the value of theta:

Can you verify that with these weights and thresholds, the various possible inputs for V₁ and V₂ result in this table?


        V1 | V2 | OR | AND
        ---+----+----+----
         0 |  0 |  0 |  0
         0 |  1 |  1 |  0
         1 |  0 |  1 |  0
         1 |  1 |  1 |  1

The exclusive OR (XOR) has the truth table:

        V1 | V2 | XOR
        ---+----+----
         0 |  0 |  0 
         0 |  1 |  1       (Note that this is also a
         1 |  0 |  1        "1 bit adder".)
         1 |  1 |  0

It cannot be represented with a single neuron, but the relationship
XOR = (V₁ OR V₂) AND NOT (V₁ AND V₂) suggests that it can be represented with the network

Exercise: Explain to your own satisfaction that this generates the correct output for the four combinations of inputs. What computation is being made by each of the three "neurons"?

These results were very encouraging, but these networks displayed no learning. They were essentially "hard-wired" logic devices. One had to figure out the weights and connect up the neurons in the appropriate manner to perform the desired computation. Thus there is no real advantage over any conventional digital logic circuit. Their main importance was that they showed that networks of simple neuron-like elements could compute.

The Perceptron

The next major advance was the perceptron, introduced by Frank Rosenblatt in his 1958 paper. The perceptron had the following differences from the McCullough-Pitts neuron:

The weights and thresholds were not all identical.
Weights can be positive or negative.
There is no absolute inhibitory synapse.
Although the neurons were still two-state, the output function f(u) goes from [-1,1], not [0,1]. (This is no big deal, as a suitable change in the threshold lets you transform from one convention to the other.)
Most importantly, there was a learning rule.

Describing this in a slightly more modern and conventional notation (and with V_i = [0,1]) we could describe the perceptron like this:

This shows a perceptron unit, i, receiving various inputs I_j, weighted by a "synaptic weight" W_ij.

The ith perceptron receives its input from n input units, which do nothing but pass on the input from the outside world. The output of the perceptron is a step function:

and

For the input units, V_j = I_j. There are various ways of implementing the threshold, or bias, theta_i. Sometimes it is subtracted, instead of added to the input u, and sometimes it is included in the definition of f(u).

A network of two perceptrons with three inputs would look like:

Note that they don't interact with each other - they receive inputs only from the outside. We call this a "single layer perceptron network" because the input units don't really count. They exist just to provide an output that is equal to the external input to the net.

The learning scheme is very simple. Let t_i be the desired "target" output for a given input pattern, and V_i be the actual output. The error (called "delta") is the difference between the desired and the actual output, and the change in the weight is chosen to be proportional to delta.

Specifically, and

where is the learning rate.

Can you see why this is reasonable? Note that if the output of the ith neuron is too small, the weights of all its inputs are changed to increase its total input. Likewise, if the output is too large, the weights are changed to decrease the total input. We'll better understand the details of why this works when we take up back propagation. First, an example.

Perceptron learning of OR (by pattern)

Before we can start, we have to ask, "how can we use this rule to modify the threshold or bias term, theta?"

Answer: treat theta as the weight from an additional input which is always "on" (V = 1). Now, consider the the net:

Unit 3 (the perceptron) receives inputs from the two input units 1 and 2, weighted by W₃₁ and W₃₂, and a constant input of 1, weighted by theta₃.

Let and intitially set all the weights to .

Then, we have

displaymath39

The error term is . This means that the change in weight will be , and the change in the bias is .

Now fill in this table showing the results of each iteration, stopping when there is no further change through the presentation of all four patterns. We call each set of four patterns an "epoch". In this case, we are "training by pattern" because we adjust the weights after each patttern. Sometimes, nets are "trained by epoch", with the net change in weights applied after each epoch. (I'll do the first iteration.)

    |     |     |      |      |         | new  | new  |  new
V_1 | V_2 | t_3 |  u_3 |  V_3 | delta_3 | W_31 | W_32 | theta_3
----+-----+-----+------+------+---------+------+------+---------
 0  |  0  |  0  |   0  |   1  |  -1     |  0   |  0   | -0.5
 0  |  1  |  1  |      |      |         |      |      |
 1  |  0  |  1  |      |      |         |      |      |
 1  |  1  |  1  |      |      |         |      |      |
----+-----+-----+------+------+---------+------+------+---------
 0  |  0  |  0  |      |      |         |      |      |
 0  |  1  |  1  |      |      |         |      |      |
 1  |  0  |  1  |      |      |         |      |      |
 1  |  1  |  1  |      |      |         |      |      |
----+-----+-----+------+------+---------+------+------+---------
 0  |  0  |  0  |      |      |         |      |      |
 0  |  1  |  1  |      |      |         |      |      |
 1  |  0  |  1  |      |      |         |      |      |
 1  |  1  |  1  |      |      |         |      |      |
----+-----+-----+------+------+---------+------+------+---------
 0  |  0  |  0  |      |      |         |      |      |
 0  |  1  |  1  |      |      |         |      |      |
 1  |  0  |  1  |      |      |         |      |      |
 1  |  1  |  1  |      |      |         |      |      |
----+-----+-----+------+------+---------+------+------+---------

How many epochs does it take until the perceptron has been trained to generate the correct truth table for an OR? Note that, except for a scale factor, this is the same result which McCullogh and Pitts deduced for the weights and bias without letting the net do the learning. (Do you see why a positive threshold for a M-P neuron is equivalent to adding a negative bias term in the expression for the perceptron total input u?)

[<--] Return to the list of AI and ANN lectures

Dave Beeman, University of Colorado
dbeeman "at" dogstar "dot" colorado "dot" edu
Tue Oct 30 12:19:58 MST 2001