FOGA-92: FOUNDATIONS OF GENETIC ALGORITHMS Vail, CO., July 24 - 29, 1992 ------------------------------ "Genetic Algorithms Are NOT Function Optimizers" Kenneth A. De Jong Computer Science Department George Mason University Fairfax, VA 22030, USA kdejong@aic.gmu.edu Abstract: Genetic Algorithms (GAs) have received a great deal of attention regarding their potential as optimization techniques for complex functions. The level of interest and success in this area has led to a number of improvements to GA-based function optimizers and a good deal of progress in characterizing the kinds of functions that are easy/hard for GAs to optimize. With all this activity, there has been a natural tendency to equate GAs with function optimization. However, the motivating context of Holland's initial GA work was the design and implementation of robust adaptive systems. In this paper we argue that a proper understanding of GAs in this broader adaptive systems context is a necessary prerequisite for understanding their potential application to any problem domain. We then use these insights to better understand the strengths and limitations of GAs as function optimizers. ------------------------------ "Modeling simple genetic algorithms" Michael Vose (vose@cs.utk.edu) Abstract: Two models of the simple genetic algorithm are reviewed, extended, and unified. The result incorporates both short term (transient) and long term (asymptotic) GA behavior. This leads to a geometric interpretation of genetic search which explains population trajectories. ------------------------------ "Remapping Hyperspace During Genetic Search: Canonical Delta Folding" Keith Mathias and Darrell Whitley Department of Computer Science Colorado State University Fort Collins, Colorado 80523 mathiask, whitley@cs.colostate.edu Abstract: Delta coding, an iterative strategy for genetic search, has been empirically shown to improve genetic search performance for some optimization problems. However, detailed analyses reveal that the remapping of hyperspace produced by delta coding is rudimentary. The underlying mechanisms of delta coding are examined and a more robust remapping method is introduced. Canonical delta folding provides a canonical ordering of the strings in the search space and then employs a powerful folding mechanism for remapping hyperspace. The behavior of canonical delta folding for remapping deceptive functions is examined using a set of exact equations for modeling the computational behavior of a simple genetic algorithm. ------------------------------ "Population Diversity in an Immune System Model: Implications for Genetic Search" Robert E. Smith Dept. of Engineering Mechanics The University of Alabama P.O. Box 870278 Tuscaloosa, AL 35487 Stephanie Forrest Dept. of Computer Science University of New Mexico Albuquerque, NM 87131 forrest@unmvax.cs.unm.edu Alan S. Perelson Theoretical Division Los Alamos National Laboratory Los Alamos, NM 87545 asp@receptor.lanl.gov Abstract: In typical applications, genetic algorithms (GAs) process populations of potential problem solutions to evolve a single population member that specifies an ``optimized'' solution. The majority of GA analysis has focused these optimization applications. In other applications (notably "learning classifier systems" and certain connectionist learning systems), a GA searches for a population of cooperative structures that jointly perform a computational task. This paper presents an analysis of this type of GA problem. The analysis considers a simplified genetics-based machine learning system: a model of an immune system. In this model, a GA must discover a set of pattern-matching "antibodies" that effectively match a set of "antigen" patterns. Analysis shows how a GA can automatically evolve and sustain a diverse, cooperative population. The cooperation emerges as a natural part of the antigen-antibody matching procedure. This emergent effect is shown to be similar to fitness sharing, an explicit technique for multi-modal GA optimization. The results imply that procedures like those in the immune system model could promote diverse, cooperative populations without the explicit, global calculations required by fitness sharing. ------------------------------ "Genetic Set Recombination" Nicholas J. Radcliffe (njr@epcc.ed.ac.uk) Edinburgh Parallel Computing Centre University of Edinburgh James Clerk Maxwell Building The King's Buildings Mayfield Road Edinburgh EH9 3JZ Scotland Abstract: The application of genetic algorithms to optimisation problems for which the solution is a set or multiset (bag) is considered. A previous extension of schema analysis, known as forma analysis, is further developed and used to construct principled representations and operators for problems in this class. The extensions to forma analysis include the introduction of genes whose values cannot be assigned independently and a method for mediating between desirable but sometimes incompatible properties of recombination operators. ------------------------------ "REAL-CODED GENETIC ALGORITHMS AND INTERVAL-SCHEMATA" Larry J. Eshelman and J. David Schaffer (lje@philabs.philips.com, ds1@philabs.philips.com) Philips Laboratories North American Philips Corporation 345 Scarborough Road Briarcliff Manor, New York 10510 Abstract: In this paper we introduce interval-schemata as a tool for analyzing real-coded genetic algorithms (GAs). We show how interval-schemata are analogous to Holland's symbol-schemata and provide a key to understanding the implicit parallelism of real-valued GAs. We also show how they support the intuition that real-coded GAs should have an advantage over binary coded GAs in exploiting local continuities in function optimization. On the basis of our analysis we predict some failure modes for real-coded GAs using several different crossover operators and present some experimental results that support these predictions. We also introduce a crossover operator for real-coded GAs that is able to avoid some of these failure modes. ------------------------------ "Analyzing Deception in Trap Functions" Kalyanmoy Deb and David E. Goldberg (deb@gal1.ge.uiuc.edu and goldberg@vmd.cso.uiuc.edu) Abstract: A flat-population schema analysis is performed to find conditions for full deception in trap functions. It is found that the necessary and sufficient condition for an l-bit fully deceptive trap function is that all order l-1 schemata are misleading, and it is observed that the trap functions commonly used in a number of test suites are not fully deceptive. Further analysis suggests that in a fully deceptive trap function, the locally optimal function value may be as low as 50% of the globally optimal function value. In this context, the limiting ratio of the locally and the globally optimal function value for a number of fully deceptive functions currently in use are calculated. The analysis indicates that trap functions allow more flexibility in designing a deceptive function. It is also found that the proportion of fully deceptive functions in the family of trap functions is only O(log(l)/l), and that more than half of the trap functions are fully easy functions. ------------------------------ "Hierarchical automatic function definition in genetic programming" John R. Koza Computer Science Department Stanford University Stanford, CA 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu PHONE: 415-941-0336 FAX: 415-941-9430 Abstract: A key goal in machine learning and artificial intelligence is to automatically and dynamically decompose problems into simpler problems. This paper describes two extensions to genetic programming, called "automatic" function definition and "hierarchical automatic" function definition, wherein functions that might be useful in solving a problem are automatically defined on- the-fly during a run. The automatically defined functions are defined in terms of dummy variables so that they can be repeatedly called from the "main" result-producing part of the program with different instantiations of the dummy variables. In the "hierarchical" version of automatic function definition, automatically defined functions may call other automatically defined functions, thus creating a hierarchy of dependencies among the automatically defined functions. Thus, a problem can be decomposed into smaller problems in the form of a hierarchy of calls to automatically defined functions. The even-11-parity problem was solved using using hierarchical automatic function definition. ------------------------------ "Recombination Distributions for Genetic Algorithms" Lashon B. Booker The MITRE Corporation 7525 Colshire Drive McLean, VA 22102-3481 booker@mitre.org Though genetic algorithms are loosely based on the principles of genetic variation and natural selection, the theory of mathematical genetics has not played a large role in most analyses of genetic algorithms. This paper reviews some well known results in mathematical genetics that use probability distributions to characterize the effects of recombination on multiple loci in the absence of selection. The relevance of this characterization to genetic algorithm research is illustrated by using it to quantify certain inductive biases associated with crossover operators. The potential significance of this work for the theory of genetic algorithms is discussed. ------------------------------ "Deception Considered Harmful" John J. Grefenstette Navy Center for Applied Research in Artificial Intelligence Code 5514 Naval Research Laboratory Washington, DC 20375-5000 E-mail: GREF@AIC.NRL.NAVY.MIL Abstract: A central problem in the theory of genetic algorithms is the characterization of problems that are difficult for GAs to optimize. Many attempts to characterize such problems focus on the notion of "Deception", defined in terms of the static average fitness of competing schemas. This article examines the Static Building Block Hypothesis (SBBH), the underlying assumption used to define Deception. Exploiting contradictions between the SBBH and the Schema Theorem, we show that Deception is neither necessary nor sufficient for problems to be difficult for GAs. This article argues that the characterization of hard problems must take into account the basic features of genetic algorithms, especially their dynamic, biased sampling strategy. ------------------------------ "Towards a Stronger Building-Blocks Hypothesis: Effects of Relative Building-Block Fitness on GA Performance" Stephanie Forrest Dept. of Computer Science University of New Mexico Albuquerque, NM 87131 forrest@unmvax.cs.unm.edu Melanie Mitchell AI Laboratory University of Michigan Ann Arbor, MI 48109 mm@santafe.edu Abstract: The building-blocks hypothesis states that the GA works well when short, low-order, highly-fit schemas recombine to form even more highly fit higher-order schemas. The ability to produce fitter and fitter partial solutions by combining building blocks is believed to be a primary source of the GA's search power, but the GA research community currently lacks precise and quantitative descriptions of how schema processing actually takes place during the typical evolution of a GA search. Another open problem is to characterize in more detail the types of fitness landscapes for which crossover will be an effective operator. In this paper we first describe a class of fitness landscapes (the ``Royal Road'' functions) that we have designed to investigate these questions. We then present some unexpected experimental results concerning the GA's performance on simple instances of these landscapes, in which we vary the strength of reinforcement from ``stepping stones''---fit intermediate-order schemas obtained by recombining fit low-order schemas. ------------------------------ "Accounting for Noise in the Sizing of Populations" David E. Goldberg, Kalyanmoy Deb, and James H. Clark Goldberg) goldberg@vmd.cso.uiuc.edu, (Deb) deb@gal1.ge.uiuc.edu Abstract: This paper considers the effect of noise on the quality of convergence of genetic algorithms (GAs). A population-sizing equation is derived to ensure that minimum signal-to-noise ratios are favorable to the discrimination of the best building blocks required to solve a problem of bounded deception. In five test problems of varying degrees of nonlinearity, nonuniform scaling, and nondeterminism, the sizing relation proves to be a conservative predictor of average correct convergence. These results suggest how the sizing equation may be viewed as a coarse delineation of a boundary between two distinct types of GA behavior. Besides discussing a number of extensions of this work, the paper discusses how these results may one day lead to rigorous proofs of convergence for recombinative GAs operating on problems of bounded deception. ------------------------------ "SYNTACTIC ANALYSIS OF CONVERGENCE IN GENETIC ALGORITHMS" Sushil J. Louis Gregory J. E. Rawlins louis@cs.indiana.edu rawlins@cs.indiana.edu Dept. of Computer Science Lindley Hall 215 Indiana University Bloomington, IN 47405 Absract: We use the average hamming distance of a population as a syntactic metric to obtain probabilistic bounds on the time convergence of genetic algorithms. Analysis of a {\em flat} function provides worst case time complexity for static functions and gives a theoretical basis to the problem of premature convergence. We suggest simple changes that mitigate this problem and help fight deception. Further, employing linearly computable syntactic information, we can provide upper limits on the time beyond which progress is unlikely on an arbitrary function. Preliminary results support our analysis. ------------------------------ "Generation Gaps Revisited" Kenneth A. De Jong and Jayshree Sarma George Mason University Fairfax, VA 22030 USA kdejong@aic.gmu.edu jsarma@cs.gmu.edu Abstract: There has been a lot of recent interest in so-called "steady state" genetic algorithms (GAs) which, among other things, replace only a few individuals (typically 1 or 2) each generation from a fixed size population of size N. Understanding the advantages and/or disadvantages of replacing only a fraction of the population each generation (rather than the entire population) was a goal of some of the earliest GA research. In spite of considerable progress in our understanding of GAs since then, the pros/cons of overlapping generations remains a somewhat cloudy issue. However, recent theoretical and empirical results provide the background for a much clearer understanding of this issue. In this paper we review, combine, and extend these results in a way that significantly sharpens our insight. ------------------------------ "Crossover or Mutation?" William M. Spears AI Center - Code 5510 Naval Research Laboratory Washington, D.C. 20375 spears@aic.nrl.navy.mil Abstract: Genetic algorithms rely principally on two genetic operators - crossover and mutation. Although there exists a large body of conventional wisdom concerning the roles of crossover and mutation, these roles have not been captured in a theoretical fashion. For example, it has never been theoretically shown that mutation is in some sense "less powerful" than crossover or vice versa. This paper provides some answers to these questions by theoretically demonstrating that there are some important characteristics of each operator that are not captured by the other. ------------------------------ "Learning Boolean Functions with Genetic Algorithms: A PAC Analysis" Johannes P. Ros Computer Science Department University of Pittsburgh Pittsburgh, PA 15260 ros@cs.pitt.edu Abstract: This paper presents a genetic algorithm for learning the classes of Boolean conjuncts and disjuncts (which include the sets of kCNF and kDNF formulas) in the context of the distribution-free learning model (the `probably approximately correct' or PAC model). A PAC learner produces with high probability an accurate representation for an unknown concept from training examples that are chosen at random from a fixed but arbitrary probability distribution. The analysis of the genetic PAC learner is complete: For any reasonable crossover operator and any confidence and accuracy level, the result provides the number of generations and the size of the population to accomplish this learning task. ------------------------------ "Is the genetic algorithm a cooperative learner" Helen Cobb Navy Center for Applied Research in Artificial Intelligence Code 5514 Naval Research Laboratory Washington, DC 20375-5000 E-mail: cobb@AIC.NRL.Navy.Mil Abstract: This paper begins to explore an analogy between the usual competitive learning metaphor presented in the genetic algorithm (GA) literature and the cooperative learning metaphor discussed by Clearwater, et al. In a blackboard cooperative learning paradigm, agents share partial results with one another through a common blackboard. By occasionally accessing the blackboard for a partial solution, an agent can dramatically increase its speed in finding the overall solution to a problem. The study of Clearwater et al. shows that the resulting speed distribution among the agents is lognormal. The GA can also be described in terms of an analogous cooperative learning paradigm. Unlike the blackboard learner, the GA shares information by copying and recombining the solutions of the agents. This slower propagation of information is necessary because the GA cannot directly evaluate parts of a solution or "partial solutions." The extent to which the GA is cooperative also depends on the choice of heuristics used to modify the cannonical GA. The few test cases presented in this paper suggest that the GA may at times yield an approximately lognormal distribution or a mixture of lognormal distributions. While the results look promising, more analysis of the algorithm's overall process is required. ------------------------------ "Simulated Crossover in Genetic Algorithms" Gilbert Syswerda BBN Laboratories BBN Systems and Technologies 10 Moulton Street Cambridge, MA 02138 Abstract: A new class of crossover operator, simulated crossover, is presented. Simulated crossover treats the population of a genetic algorithm as a conditional variable to a probability density function that predicts the likelihood of generating samples in the problem space. A specific version of simulated crossover, bit-based simulated crossover, is explored. Its ability to perform schema recombination and reproduction is compared analytically against other crossover operators, and its performance is checked on several test problems. ------------------------------ "An Executable Model of a Simple Genetic Algorithm" Darrell Whitley (whitley@CS.ColoState.EDU) Abstract: A set of executable equations are defined which model the ideal behavior of a simple genetic algorithm. The equations assume an infinitely large population and require the enumeration of all points in the search space. When implemented as a computer program, the equations can be used to study problems of up to approximately 15 bits. These equations represent an extension of earlier work by Bridges and Goldberg. At the same time these equations are a special case of a model introduced by Vose and Liepins. The various models are reviewed and the predictive behavior of the executable equations is examined. Then the executable equations are extended by quanitifying the behavior of a reduced surrogate operator and a uniform crossover operator. In addition, these equations are used to study the computational behavior of a parallel island model implementation of the simple genetic algorithm. ------------------------------