None

[<--] Return to the list of AI and ANN lectures

Computer models of mind and brain - Artificial Intelligence

This is the "and Computers" part of the course. If I had to chose one phrase to describe the topics we will cover, I'd pick "machine intelligence" - simulating Brain or Mind with a computer.

Use of computers to model some aspect of "brain" or "mind".

There are many different approaches and lots of ways to categorize them, but I like to group them into three topics:

Computational neuroscience
Artificial Intelligence
Artificial Neural Networks

I talked a little bit about the first topic earlier in the semester. Today I'll talk some about the second. But, the emphasis for the rest of my lectures will be on the third. Some of you may want to attack the the first two in your projects.

So, what is the point of making computer models of "brain" or "mind"? I can think of two main reasons why we might want to do this -

pure Science - try to understand "how the brain computes" or "how intelligence works" by testing hypotheses with computer simulations
practical Engineering - learn how to make "better" or "more intelligent" computers

Scientists and people who are interested in understanding fundamentals tend to be interested in the first goal, and engineers and people with a more practical orientation and an interest in applications tend to focus on the second. Clearly, there is a lot of overlap between these groups -- an engineer needs to know something about the first question in order to answer the second. Of course, a philosopher will want to start by answering the question "what is intelligence?" This can be a lot of fun to debate with friends late at night after a few beers, but I'm not going to say much about it because it could take forever. We all recognize that although computers are very fast, there are many things which people, and even very primitive animals, do better and faster. For example, which of these is not a tree? Or, what does this represent?

	(viewgraph of trees and telephone pole) - all are "tree-like"
	( line drawing of a cat               ) - very stylized

These are hard problems for a computer to solve, even though we can solve them easily. All three approaches (Computational Neuroscience, AI, and Artificial Neural Nets) attempt to understand how problems like these can be solved. It's interesting to think about what we do when we solve them. What goes on in your mind or in our brain when you solve one of these problems?

This introspective approach is basically the one taken by traditional artifificial intelligence (AI). How would you tell someone what you did to solve these problems? (Or is your answer just a rationalization of something you did much less consciously?)

Computational Neuroscience falls mostly into the first category of goals, although it may shed some light on how to accomplish the goals of the second. The other two approaches, AI and Neural Networks, both offer a more practical approach toward implementing "machine intelligence". In order to understand how the brain works, we might have to model it down to the detail of single ion channels, but (at least with present computer technology) we wouldn't expect our simulations to be a practical way of performing "brain-like" computations. We need to leave out a some details.

So, how do we make a computer "intelligent"?

This knotty problem is like a tangled ball of wire which we can attack from two sides. The two different practical approaches are "Traditional AI" and "Artificial Neural Nets". The first approach has its roots in psychology, and might be called "Minds and Computers". It tends to focus on high level abstractions like "the mind". The other approach, which I'll call "Brains and Computers" tries to apply what biology and physiology tell about the way the brain works.

    Traditional AI       ---> "Machine Intelligence" <--- Neural Nets
    "Minds & Computers"            /\/~~\               "Brains & Computers"
    psychology (behavior) ___/\ /\ \/\/\ \              biology (physiology)
    high level,top-down,       X /   /\ \_X___/~~~~~    low level,bottom-up,
    macroscopic               / V   /  \/\__/           microscopic
                              ~~~~~~

Traditional Artificial Intelligence - falls into both categories of goals - represents the opposite extreme in terms of the level of microscopic detail in the models. The models tend to be high level abstract models based more on psychology or linguistics, rather than on biology. (Example: Freudian psychology uses non-biological concepts like the ego, superego and id. Jungian psychology uses others. Even if there are no biological structures corresponding to these functions, they may be useful models for understanding mental processes.) AI has been used as a tool for understanding "how intelligence works". But, I think it is most useful as a practical tool for making computers more like minds.

Artificial Neural Networks - largely the second category (although many workers in this field believe that simplified models can shed light on the workings of the brain). Here, the idea is to perform computations with networks of neuron-like elements. The approach has a lot in common with computational neurobiology, but we would like to leave out as many of the complicated biological details as we can safely ignore.

People with different interests and backgrounds tend to have different opinions/prejudices about the best end of the problem to attack. Having worked as a solid state physicist, I like the bottom-up approach. If I want to understand the behavior of fluids, I start with a computer simulation of interacting molecules and try to understand their macroscopic behavior on the basis of what we know about the microscopic behavior of the components. On the other hand, someone who needs to predict the behavior of 40 weight racing oil won't find this approach very helpful, at least in the short term. He or she may need a more empirical approach, using a lot of "black boxes" which aren't understood in detail. Also, we'll see that some types of "intelligent behavior" are better treated with one approach than the other. You might think about which approach seems best for the two pattern recognition problems which I posed. Do you take a cognitive approach (thinking about how you would describe the differences between a tree and a telephone pole)? Or is your process less conscious? Or do you think about how a frog recognizes a fly? A frog probably doesn't intellectualize things much. A pattern falls across its retina, and ZOT!

The nice thing about being an engineer is that you get to pick and choose between the various alternatives, and choose the one that seems to offer the best possibilities at the time. Sometimes when untangling a ball of wire, you work on one loose end until progress slows down, and then switch to the other. You hope to eventually meet in the middle. My own opinion is that recently, the Artificail Neural Net approach has been more fruitful - particularly if it is based on an understanding of the biology of the brain.

Another thing that I probably don't need to warn you about is that any time someone offers you a nice clear-cut duality like the one I've posed here, you should be VERY suspicious. You've been around long enough to know that these sorts of dichotomies are, at best, convenient but over-simplified approximations. There's a lot more overlap between these categories than I've implied. We have people in the psychology department here at CU who study both artificial and biological neural networks. People in the CS department work with both AI and neural network approaches to problems in linguistics. Marvin Minsky, who is a big name in traditional AI, has had a large influence (not completely positive) on the development of neural network theory. Plenty of biologists and neural network researchers take their direction from ideas about the behavior of the "mind". Some of them even speculate on fuzzy ideas like "consciousness". As we see progress from both directions, these groups will meet in the middle. Over time, we'll see the AI models become more grounded in biological fact, and the Neural Net models become more sophisticated and hierarchical in their organization. At some stage, it may no longer be convenient to make distinctions between these approaches.

I'm going to give you a very brief overview of the AI approach today. Then I'll concentrate on the Neural Net approach for the next six lectures.

If you would like to do some more reading on AI and artificial neural nets, you may want to look at this list of references, which gives some suggestions for optional reading. (This is not very up-to-date, however.)

Overview of traditional AI

When I say "AI" I mean traditional AI, because many people like to classify the neural net approach as just another part of AI. (This is not an unreasonable thing to do.)

The goals of what I call "traditional AI" are essentially the ones that I mentioned before: to use computer models to help us understand "intelligence", and to find ways to make computers exhibit intelligent behavior. Although I won't try to define intelligence, it doesn't hurt to try to list a few of the attributes of intelligent behavior. What things distinguish intelligent human behavior from a cleverly written computer program? Any ideas?

It might include the ability to:

formulate an idea
draw inferences
develop beliefs based on experience
formulate a goal and methods for achieving it (solve problems)

Of course, there are many definitions of AI. The one I like the best comes from the book by Rich in the list of references: "AI is the study of how to make computers do things which, at the moment, people do better". It points out that AI is a moving target. We call something AI if we are on the verge of getting a computer to do it, but once we are sucessful, then the mystery disappears and it becomes just another clever computational technique.

One question which we could ask (but probably not answer) is: If a program exhibits the outward signs of intelligence, but does it in a completely different way from the way people do it, is it intelligent? In class, we discussed

a hypothetical "brute force" chess program that could work out the consequences of all possible moves. (This is not feasible with current computers, but might be in the future.)
ELIZA ("Doctor"), a simple program that plays the role of a psychoanalyst by making use of grammatical rules to carrry on a dialog with the patient. (Here are some links, not guaranteed to be current, for information about ELIZA: a history of ELIZA and its author and Joseph Weizenbaum, Hamlet talks to doctor ELIZA (an example dialog).
The Turing test, in which a human tries to determine if he is carrying on a dialog with another human or a computer. Could a more sophisticated version of ELIZA pass the Turing test? If so, would it be intelligent?

The answer to this question probably depends on your goals - understanding the mind and intelligence vs. practical applications. However, to solve truly hard problems, it may not be possible to avoid paralleling human thought processes. Does evolution tend to produce optimum solutions? There is evidence that, at least in some areas, it does. (Humans can detect single photons - fly vision information processing efficiency has been shown to be near the theoretical limit.)

Historical outline of AI applications and techniques

I'll list some applications, and give some history along the way. One reason for listing some of these potential applications and goals of Artificial Intelligence is so that later, after we know something about artificial neural networks, we can ask: "which of these are best solved by neural nets and which are best left to traditional AI techniques?".

    Game playing - limited domain of knowledge - appeared easy, but has a
      "branching factor" of 35 for chess - leads to a "combinatorial explosion"
	  Alan Turing - early '50s chess program

	  Arthur Samuel - 1952 checkers - pioneered modern search and learning
             strategies - "informed search" - evaluation functions for
	     "goodness" of move

          John McCarthy (one of AI's founding fathers, inventer of LISP)
	   1966 arranged a computer chess match between the US and Russia
	    - neither played very good chess

	  1967 Richard Greenblatt of MIT AI lab wrote first of modern chess
	   programs, MacHack (see "Hackers" by Steven Levy)

	  1989 - Neurogammon (Backprop ANN) won the international computer
	   backgammon championship, competing against programs using
	   traditional AI game-playing strategies (an important milestone)

	Theorem proving - specialized knowledge and formal logic, but still
	    requires "good judgment" and intuition - can learn a lot about
	    thought processes by trying to write such a program - how would
	    you teach someone good strategies for proving trig identities?
	    - introspection is a popular tool for AI
	    
	Natural language processing,  machine translation, speech recognition
	    (this last is very difficult - "It's hard to wreck a nice beach")

	    1950's success with trivial problems ==> disillusionment

	    1954 Georgetown U - Russian/English translation for petroleum
	    engineering had 250 words, 6 rules - looked promising.
	    But was harder than it looked - An early translation to Russian
	    and back to English gave "the vodka is good, but the meat
	    is rotten"  ("The spirit is strong, but the flesh is weak".)
	    An important application: intelligent database retrieval, or
	    an intelligent internet search engine
	    
	    1966 NSF report - $20 million wasted ==> 1970's "winter of AI"

            1973 in England, the "Lighthill report" drew similar conclusions
	    and stopped research in AI for a decade.
	    lesson: knowledge representation is important!
	    - resurgence and hype in the '80s
	    software ads "new and improved with AI" - like a detergent
	    the popularity of Neural Nets has had a similar history

	Vision/pattern recognition - industrial and military applications

	Robotics - uses vision, plus problem solving, dealing with
	    obstacles, formulating goals and plans - "put the red block on
	    top of the yellow one" (The red block may be under the blue one.)

	Automatic Programming - output high level code from specifications

        Scheduling - optimal path - Traveling Salesman Problem -
        applications to manufacturing Machine Learning - performance should
        improve with experience
	
	Expert Systems - probably the most commercially successful area in AI
	  - replace an expert in some specialized (and commercially
	  profitable) domain

Expert Systems

Expert Systems have undoubtedly been the most successful AI application in recent years, and have been largely responsible for the resurgence of interest in AI, so they merit a little more discussion. Cognitive scientists who are trying to understand the mind, and the nature of human thought don't find these very interesting, but they work! It will be interesting to compare the way that an expert system program and a neural net simulation solve the same problem.

They are called Expert Systems because they replace a human expert in some specialized (and usually commercially profitable) domain. They are also called Rule Based Systems or Knowledge Based Systems because they incorporate the expert's knowledge in an explicit set of rules.

Examples of some Expert Systems:

    MYCIN - medical diagnoses in a specialized domain (infectious blood
     diseases)
    XCON - used by DEC to configure mainframe computer systems from a customer's
     order (there are many decisions to be made about physical placement of
     components in cabinets, cabling, power supplies, etc.)  A more modern
     example would be computerized layout of semiconductor chips.
    PROSPECTOR - decision making in mineral exploration - "is it worthwhile
     digging here?" - it discovered a large mineral deposit
    DENDRAL - deduces a chemical structure from mass spectograms

There are also numerous programs for diagnosing equipment problems, giving tax advice, scheduling in manufacturing, etc. These are examples of certain types of problems for which this approach works well. All of them have some things in common:

Specific domain of knowledge - little need for "common sense" or general knowledge about the world.
There are recognized experts in the field who either explicitly or intuitively make use of rules in their problem solving approach.
The task is routinely taught to beginners.
The task has a high payoff - it's worth the effort to develop an ES

Components of a RBS:

Rule base or collection of rules, stated as IF-THEN pairs or condition-action pairs
Data base - Long Term Memory (LTM) for unchanging facts and Short Term Memory (STM) (working memory) for temporary storage of information relating to the current state of the problem solution, short term goals, agendas, etc.
Control strategy (inference engine) to select which rule to use -- usually embodies some conflict resolution strategy to deal with the case when more than one rule is applicable.

The control strategy or inference engine works in a loop, identifying rules that apply, choosing a rule and applying it. This modifies the contents of working memory, so that other rules may apply. The process repeats until the goal is reached, or no rules apply. There are various conflict resolution schemes which are used to pick which rule to use when more than one is applicable.

Sometimes, the components are arranged a little differently. The Rule Base and LTM can also be considered as part of the permanent "Knowledge Base", with the temporary STM shown in a separate box. The interactions between the components are the same, but the division of the knowledge base shown here emphasizes the differences between what we call Procedural and Declarative memory. This is justified by evidence that you are using different neural circuitry when learning to drive a car than when you are memorizing facts.

This approach was proposed in the early 60's as a model of methods used by people to solve problems. The concepts of LTM and STM arise from psychological experiments and have a basis in biology. For example, STM seems to obey the 7 ± 2 rule. When memorizing strings of numbers or items displayed on a table, most people can only hold about 7 ± 2 "chunks" of information in their mind at one time unless they are converted into long term memory. Of course, we don't have to abide by this size limitation in designing an expert system. However, it is useful to make this distinction between LTM and STM, and have an area of memory for temporary knowledge that will soon be thrown out.

Also, the rule base resembles a set of stimulus-response pairs which mirror the way experts solve a problem: "Well, in the case of so-and-so, I usually do such-and-such". "If the starter doesn't turn over, I check the battery. If the battery is dead, then I see if it will hold a charge."

Whether or not you like this as a good cognitive model, you have to agree that it is an effective technique for problem solving, if used on the right kinds of problems.

Here are some possible conflict resolution schemes: (Not covered in lecture)

Specificity ordering: - pick the rule with the most specific conditions to be met (this narrows down the search space) - or which leads to the most specific conclusions
Size ordering: - pick the rule that has the largest number of conditions in the "IF" part
Rule ordering: - prearrange them in a priority list and pick the first one in the list that is satisfied (easy for the programmer - hard for the "knowledge engineer")
Recency ordering: - pick the applicable rule that was most recently used

An example: DENDRAL

In a mass spectrometer, the unknown compound is bombarded with a beam of electrons, expelling electrons and breaking it into a number of positively charged fragments. These are accelerated to a known velocity by an electric field, and deflected into a circular path by a magnetic field. From the radius of the path and the known applied fields, it is possible to calculate the mass/charge ratio of the detected particles. The viewgraph for DENDRAL shows the mass spectrogram plot of the relative intensity measured by the detector as a function of the mass/charge ratio of the particles, and the the chemical formula and structure for a particular compound,


      C H  O 
       8 16

This formula can be known from quantitative analysis, but there are 698 possible combinations of the 8 carbon, 16 hydrogen and one oxygen atoms in a molecule. The actual structure, which has to be determined from the mass spectrometer results is:

      CH  - CH - C - CH - CH - CH - CH - CH
        3     2  ||    2    2    2    2    3
                 ||
                 O

Some of the DENDRAL rules are:

Rule 74:

    IF   The spectrum for the molecule has two peaks
         at masses X1 and X2 such that:
             X1 + X2 = M + 28
                 and
             X1 - 28 is a high peak
                 and
             X2 - 28 is a high peak
                 and
             at least one of X1 or X2 is high
    THEN The molecule contains a ketone group

Rule 75:

    IF    There is a high peak at mass 71
              and
          There is a high peak at mass 43      
              and
          There is a high peak at mass 86
              and
          There is a high peak at mass 58
    
    THEN  There must be an N-propyl-ketone3 structure

Note that rule 75 leads to more specific conclusions than 74. If both of these apply, the conflict resolution scheme might choose rule 75 instead of 74. The conclusions of either of these rules would presumeably be part of the condition of others.

How does an expert system program differ from a program in Pascal or C with lots of IF-THEN-ELSE or CASE statements? The important distinction is that the rules are used as Data, rather than Code, i.e. it is "data driven" rather than "procedural" (We can see this from the example rules for the DENDRAL system.) This has some important consequences:

There is no fixed order of application of the rules - it is determined by the interaction of the control strategy and the database
Rules interact only indirectly through the database - they don't "call" or branch to each other
The global database can be accessed by all the rules - no part of it is local to any of them in particular
With a sophisticated control strategy and a large number of rules, it might be impossible for a human to predict the order in which the rules would be applied to a complicated problem. (This doesn't mean that it is random or non-deterministic, though.) This is why we might call this "AI" - we may not be clever enough to figure out the optimum order of applying the rules.

This has a number of advantages:

It is much easier to add new rules as the system becomes more sophisticated or we learn more about the problem through experience (modularity --> modifiabliity)
Most intelligent action is strongly data-driven - as new information is added, behavior changes.
It allows useful, but unplanned interactions or sequences of actions which couldn't have been anticipated by the programmer.
The system can explain its conclusions by keeping track of the order in which rules were applied. No doctor will believe a computer program's diagnosis unless it can do this!
Allows the programmer, user aand expert to participate in the program development to a much greater extent than with a conventional program. The "knowledge engineer" is the middle man in this interaction.
It allows new but similar problems to be solved by changing the rules and/or facts. Expert system shells (OPS5, and many variants for PC's are the hottest selling category of AI software). However, the disadvantage to this is that the appropriate control strategy may depend strongly on the problem to be solved, and the details of the implementation could depend on the way in which the rules and data are represented. Knowledge representation is a very important topic in AI.

Discuss: how would we program a computer to recognize an insect? Would we use an expert system, or something more like "frog intelligence"?

Some other AI techniques

(This is optional material not covered in class)

Some of these will be just "buzzwords" without adequate explanation. The idea is just to identify some things that you might want to learn about some day.

	Search - searching a decision tree - like "20 questions"
           "combinatorial explosion" - a "cost function" may be used
           to prune the tree
	
	Inference and logical deduction  - deduction (cause --> effect),
           induction (generalization), and abduction (effect --> cause)
           	  - PROLOG language for predicate logic

        Fuzzy Logic - often incorporated into Expert Systems

	Semantic networks - represent interrelationships in a way that
	  facilitates deductions - property inheritance (LISP and PROLOG
	  have features that make it easy to implement this)

	  Example: a network of relationships that apply to Clyde the
          elephant (What color is Clyde?  Can elephants move?) -
          This representation has the neural analogy of
          "spreading activation".

	Frames - slot (category) and filler (information content) notation,
          developed by Minsky (MIT, '70s) - represent stereotypes
          which help us make sense of a situation - You walk into an
          unfamiliar room and have certain expectations (windows on walls,
          chairs on the floor and not on the ceiling, etc.) A robot should
          recognize a rectangle on the wall with light entering as a
          window.  When understanding a report of an earthquake in a
          newspaper, one expects the location, number killed, dollar amount
          of damages, etc.  Organizing information this way allows a
          program to represent knowledge in a way that allows it to answer
          questions.  (A good program will handle exceptions well.)

	Scripts - Schank (Yale) - "John went into the restuarant and sat
          down.  After waiting a long time, he got angry and left."  (Did
          he eat?  Why was he angry?)  - similar to frames in the sense
          that it makes use of a stereotyped sequence of events.  A
          "restaurant script" involves someone coming to take your order
          before you eat.

These last three are forms of knowledge representation as well as techniques for querying an "intelligent" program.

[<--] Return to the list of AI and ANN lectures

Dave Beeman, University of Colorado
Tue Oct 12 13:15:18 MDT 2004