From alnl-owner Wed Mar 4 08:33:24 1992 Received: from clvax1.cl.msu.edu ([35.8.2.1]) by scapa.cs.ualberta.ca with SMTP id <42585>; Wed, 4 Mar 1992 08:04:37 -0700 Date: Wed, 4 Mar 1992 08:02:00 -0700 From: "DOUG PIERCE CPS115" Subject: Extracting rules from ALNs? To: "alnl" Message-Id: <92Mar4.080437mst.42585@scapa.cs.ualberta.ca> Greetings! I am new to this list, and so will introduce myself. My academic background is: MS in Biostatistics, MS in Computer Science. Working right now as a data manager and statistician at a medical school. Have done lots of logic programming work (in Prolog). Interested in inductive synthesis (or whatever you want to call it -- extraction, etc) of rules from ALNs. I don't see how one could possibly do this from any other type of net I know about. My bias: I think you have to be an EE at heart to love those other nets (which I am not at all, though I have spent lots of time studying BP and similar algorithms). Is there anyone out there who has demonstrated, or who would like to attempt as much, that you can extract coherent rules from ALNs? It would not take a very complex example to make some judgment about this. However, I think this idea is closely linked to others that would benefit from more substantial examples. For one thing, I am excited to hear any mention of problem decomposition with ALNs, but I haven't seen any clear attempts to illustrate how this might work -- let me know if I'm missing something here, I wouldn't be surprised. I guess this would be a matter of using ALNs to find partitions or cover sets of input elements, then identifying hierarchical relationships within and between these sets. But that may be a pretty wild guess. If anyone has a clue about this, please let me know. From alnl-owner Wed Mar 4 08:53:50 1992 Received: from clvax1.cl.msu.edu ([35.8.2.1]) by scapa.cs.ualberta.ca with SMTP id <42765>; Wed, 4 Mar 1992 08:42:41 -0700 Date: Wed, 4 Mar 1992 08:30:00 -0700 From: "DOUG PIERCE CPS115" Subject: Looking for a good metaphor To: "alnl" Message-Id: <92Mar4.084241mst.42765@scapa.cs.ualberta.ca> One thing that is missing (for me) in the ALN literature, is a metaphor that captures in a succinct way the real strengths of this approach. The discussion of implementation problems and their associated theoretical problems is of course the real meat of this type of interchange. But... NNs and GAs (genetic algorithms) have their biological metaphors ready-made and have gotten lots of mileage from such associations. This is in spite of the extreme looseness of these assocations. Proponents of ALNs have to look elsewhere for inspiration, I think. Actually, my favorite kind of inspiration comes from positive results! Which may turn out to be a distinguishing feature of ALNs. In any case, I think this is a significant issue, not just fluff. Any comments? Doug Pierce From alnl-owner Fri Mar 6 09:01:11 1992 Received: from sumac.ecn.purdue.edu ([128.46.161.181]) by scapa.cs.ualberta.ca with SMTP id <42505>; Fri, 6 Mar 1992 04:37:53 -0700 Received: by sumac.ecn.purdue.edu (5.65/1.30jrs) id AA10580; Fri, 6 Mar 92 06:37:22 -0500 Date: Fri, 6 Mar 1992 04:37:22 -0700 From: mccauley@ecn.purdue.edu (Darrell McCauley) Message-Id: <9203061137.AA10580@sumac.ecn.purdue.edu> To: alnl@cs.ualberta.ca Subject: other appl of encoding/decoding Has anyone used the encoding/decoding stuff in atree for learning paradigms other than adaptive logic? Investigated the consequences of using bit vectors for something that expects inputs in [0,1]? Darrell McCauley Department of Ag Engr, Purdue Univ mccauley@ecn.purdue.edu West Lafayette, Indiana 47907-1146 From alnl-owner Fri Mar 6 12:32:57 1992 Received: by scapa.cs.ualberta.ca id <42528>; Fri, 6 Mar 1992 12:09:14 -0700 From: Bill Armstrong To: alnl Subject: using ALNs on quantized reals Cc: arms Message-Id: <92Mar6.120914mst.42528@scapa.cs.ualberta.ca> Date: Fri, 6 Mar 1992 12:08:35 -0700 I would like to make some points about the use of adaptive logic networks to solve problems with continuous values as inputs and/or outputs. atree release 2.0 has facilities for encoding quantized reals into boolean vectors automatically. A random walk technique is used whereby one says how many bits are used to represent the value e.g. 20 and what is the number of bits changed to go from the code for one value to the code for the next higher quantization level eg 1. This is called a 20:1 code. Encoding is easily done by a table but the decoding is done by linear search, which is *very* inefficient. Allen Supynuk has done decoding far more efficiently using Golay codes. Note that the output variable has a coding of 16:3, where it is hoped that the 3 will lead to fewer errors in decoding. Here is part of a sample program for lf for some medical data: tree size = 8191 min correct = 122 max epochs = 20 function domain dimension = 29 largest = 200 200 100 200 100 100 100 100 120 100 100 300 150 10 150 10 0 3000 6000 2000 2000 20 20 200 100 20 5 50 50 5 20 smallest = 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 -2000 -2000 -2000 -2000 0 0 0 0 0 0 0 0 0 0 coding = 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 20:1 16:3 quantization = 101 101 101 101 101 101 101 101 121 101 51 151 151 101 15 1 101 101 101 101 101 101 101 101 101 21 51 51 51 51 21 training set size = 122 training set = 69 102 55 68 43 24 31 17 0 3.45 17 158 92.8 1.68 50 24.4 1182 2423 324 664 3.2 1.6 46.2 22.5 1.45 0.71 21 .1 10.3 2.05 13 etc. ... I now believe that the random walk encoding technique is the wrong way to use ALNs on continuous inputs, one should just compute a threshold like xi >= tj or xi < tj to get each input to the adaptive tree. The problem with random walk encoding lies not only in the somewhat complex decoding step, but in the fact that the boolean functions produced by the random walk encodings cannot be forced to be monotonic increasing and monotonic decreasing when this is necessary. Namely, if you look at what happens to, say the 5th bit of the 20 used above, it may change back and forth between 0 and 1 several times as one goes sequentially through all the quantization levels of the original real. Hence the function produced by the tree can do the same. I feel monotonicity is an extremely important tool for assuring that the function synthesized by an ALN is within the specified range of values for *all* possible inputs. Otherwise the network might be very unsafe to use. There are in general two ways I can think of for developing safe neural nets. 1) You might be able to show using a bound on the magnitude of the weights in an MLP that you have covered the set of possible inputs adequately in testing so that between test points the function can only vary by some acceptable epsilon. For systems that use approximate arithmetic and table lookup, those factors also have to be taken into account. For low dimensional spaces where each variable is quantized to few values, this technique may provide a solution. In many cases the input space is of higher dimension e.g. 10, and each dimension could be quantized to 100 levels. Already this gives 100 ^ 10 points in the input space, which is too many to test individually. If the required epsilon allows you to test 5 ^ 10 points by just testing one at the centre of a ball, you still have 20 ^ 10 tests to make, which is still ten million million. 2) You could try to force monotonicity, so that the greatest and least values of the network output are guaranteed to be taken on at two corners of a parallelepiped in the space of inputs, which you make test points. In the usual MLPs, where the function to be learned is not monotonic, you will have both positive and negative weights influencing the output for *every* input. Hence you can't guarantee a safe system by this means. If you want a safe ALN system, I suggest you break the input space up into perhaps overlapping parts using either conditions on the input variables themselves eg x < 15 or x >= 10, or using some computed functions c(x1,...xn) < 0, etc. In each of the parts so created your a priori knowledge should suggest that the function to be learned can be assumed monotonic in each input separately. The parts learned and tested separately can then be safely combined by higher level gates, with very little additional cost in hardware or time. You set up the monotonic problems so that instead of learning the value of the function f, you learn a predicate P(x1,..., xn, y), where the value of P is 1 if y < f(x1,...,xn) and is 0 otherwise. Then you force the ALN trained on that part to be appropriately monotonic by using as inputs to the adaptive tree thresholds of only one direction (< or >=, but not both), the direction depending on the direction of monotonic increase in the variable concerned. In particular, y only gets threshholds of the form y < tj (otherwise binary search won't work). Now the predicate answers (by giving a 1) questions of the form "is the function value for x1,...,xn greater than y?" If we want only a decision like this we are finished. If we want an approximation to the the actual f(x1,...,xn) we use the same tree iteratively, each time changing just y to perform binary search on y. The number of steps of search depends on the precision to which you need the function value. This technique can allow one to develop safe solutions to problems, and to avoid the complexity and obscurity involved in using the random walks. The current atree code doesn't make it easy to develop safe solutions, so I can only hope that people using ALNs and other MLPs are somehow taking steps to guarantee safety where it is required. The currently used "black box" technique of train and test is definitely risky unless additional steps are taken. From alnl-owner Wed Mar 11 23:53:39 1992 Received: by scapa.cs.ualberta.ca id <42672>; Wed, 11 Mar 1992 18:17:53 -0700 From: Bill Armstrong To: alnl Message-Id: <92Mar11.181753mst.42672@scapa.cs.ualberta.ca> Date: Wed, 11 Mar 1992 18:17:32 -0700 From: "DOUG PIERCE CPS115" Subject: Extracting rules from ALNs? To: "alnl" Message-Id: <92Mar4.080437mst.42585@scapa.cs.ualberta.ca> Status: OR Doug Pierce writes in introducing himself to the alnl list: > Have done lots of logic programming work (in Prolog). > For one thing, I am excited to hear any mention of problem decomposition with ALNs, but I haven't seen any clear attempts to illustrate how this might work -- let me know if I'm missing something here, I wouldn't be surprised. I guess this would be a matter of using ALNs to find partitions or cover sets of input elements, then identifying hierarchical relationships within and between these sets. The use of predicates to efficiently represent continuous-valued functions is related to Prolog, i. e. y=f(x) is represented as P(x,y), where the logic tree of ANDs and ORs gets thresholds x < t1, x >= t2, y < t3, etc. as inputs. Since thresholds of the form y >= t4 are not used, we can do binary search to find y given x. If we decompose the space hierarchically, with ALNs as you suggest or just with simple conditions like thresholds or linear discriminants, then in each part we can impose a condition of monotonicity by restricting the thresholds on x to one direction, e.g. x >= t5. Now, given y, we can find in any part the least x such that P(x,y) = 1. This provides an inverse of the monotonic function on the part. The details are tedious to work out, but the idea is that the ALN approach enables you you instantiate any variables in a predicate and then compute the constraints that apply to the remaining variables. E.g. With P(x,y,z,w), you can instantiate x and w and still be able to find for each y the least (or greatest) value of z which makes P=1. So I think ALNs fit nicely into the framework of newer versions of Prolog that emphasize constraint solving. > (I am ...) interested in inductive synthesis (or whatever you want to call it -- extraction, etc) of rules from ALNs. I don't see how one could possibly do this from any other type of net I know about. I think it is an NP-complete problem to decide if a given logic tree will ever produce a 1 output (satisfiability). Hence the general problem of telling what an ALN does is intractable. Since ALNs are special cases of multilayer perceptrons, it is also an intractable problem to tell what an MLP does. > My bias: I think you have to be an EE at heart to love those other nets (which I am not at all, though I have spent lots of time studying BP and similar algorithms). On the contrary, I think EEs, who are very familiar with switching theory, should love ALNs. Maybe numerical analysts should love BP nets, after all, BP is just approximation theory with a particular twist based on Kolmogorov's theorem. >From Doug's second message: > One thing that is missing (for me) in the ALN literature, is a metaphor that captures in a succinct way the real strengths of this approach. ALNs are fast, simple and can support the design of safe systems. Until we get a good metaphor, if there is one, "fast, simple, and safe" is a good characterization. > NNs and GAs (genetic algorithms) have their biological metaphors ready-made and have gotten lots of mileage from such associations. This is in spite of the extreme looseness of these assocations. I would also like to think that brain theory people would prefer an artificial neural network where all the signals are 0 or 1, like action potentials in axons and dendrites. Some of the adaptation algorithms I have tried over the years have a flavor much closer to (my understanding of) biological systems than either backprop or the published ALN algorithms. However, given a choice between something that looks biological and something which implements something that produces a near optimal solution to a statistical pattern recognition problem, I would invariably choose the latter. (After all, why aspire to merely *equal* biological brains' power? There is a real need for something better -- don't you agree?) > Proponents of ALNs have to look elsewhere for inspiration, I think. Actually, my favorite kind of inspiration comes from positive results! Which may turn out to be a distinguishing feature of ALNs. Hope so. > In any case, I think this is a significant issue, not just fluff. As someone who would like the ALN field to flourish, I agree. Thanks for your thoughtful comments. Bill From alnl-owner Thu Mar 12 22:02:17 1992 Received: from bramble.ecn.purdue.edu ([128.46.161.174]) by scapa.cs.ualberta.ca with SMTP id <42234>; Thu, 12 Mar 1992 21:42:55 -0700 Received: by bramble.ecn.purdue.edu (5.65/1.30jrs) id AA05210; Thu, 12 Mar 92 23:42:15 -0500 Date: Thu, 12 Mar 1992 21:42:15 -0700 From: mccauley@ecn.purdue.edu (Darrell McCauley) Message-Id: <9203130442.AA05210@bramble.ecn.purdue.edu> To: alnl@cs.ualberta.ca Subject: quantization as a filter? I am completing a test of adaptive logic networks for a particular application (predicting a physical property using image features). It *appears* (meaning I haven't fully verified this) that ALNs did much better at predicting than statistical regression. Now, I'm trying to figure out/explain why. One theory is that the quantization of real-valued inputs by the atree software may have acted as a filter, or, in some other way, was beneficial in function mapping. Has this notion ever been suggested by anyone? References? Darrell -- James Darrell McCauley Department of Ag Engr, Purdue Univ mccauley@ecn.purdue.edu West Lafayette, Indiana 47907-1146 From alnl-owner Fri Mar 13 10:34:43 1992 Received: by scapa.cs.ualberta.ca id <42466>; Fri, 13 Mar 1992 10:17:17 -0700 Subject: Re: quantization as a filter? From: Bill Armstrong To: mccauley@ecn.purdue.edu (Darrell McCauley) Date: Fri, 13 Mar 1992 10:16:58 -0700 Cc: alnl In-Reply-To: <9203130442.AA05210@bramble.ecn.purdue.edu>; from "Darrell McCauley" at Mar 12, 92 9:42 pm X-Mailer: ELM [version 2.3 PL11] Message-Id: <92Mar13.101717mst.42466@scapa.cs.ualberta.ca> > > > I am completing a test of adaptive logic networks > for a particular application (predicting a physical > property using image features). It *appears* (meaning > I haven't fully verified this) that ALNs did much > better at predicting than statistical regression. > Now, I'm trying to figure out/explain why. > > One theory is that the quantization of real-valued > inputs by the atree software may have acted as a filter, > or, in some other way, was beneficial in function mapping. > Has this notion ever been suggested by anyone? References? > > Darrell > -- > James Darrell McCauley Department of Ag Engr, Purdue Univ > mccauley@ecn.purdue.edu West Lafayette, Indiana 47907-1146 > You don't say whether you are using linear regression or not. If the problem is non-linear, why shouldn't a flexible mechanism like ALNs do much better? The quantization mechanism may act to filter out small amounts of noise, but really, if the coding of reals into booleans is done using random walks, it should merely preserve (at least locally) the relative distances between quantization levels. I don't see why it would improve things much over an appropriate statistical technique. Anyway, I hope it doesn't turn out to be untrue when you verify it. Bill From alnl-owner Fri Mar 27 10:01:32 1992 Received: from clover.ecn.purdue.edu ([128.46.161.152]) by scapa.cs.ualberta.ca with SMTP id <42585>; Fri, 27 Mar 1992 09:29:49 -0700 Received: by clover.ecn.purdue.edu (5.65/1.30jrs) id AA05643; Fri, 27 Mar 92 11:29:16 -0500 Date: Fri, 27 Mar 1992 09:29:16 -0700 From: mccauley@ecn.purdue.edu (Darrell McCauley) Message-Id: <9203271629.AA05643@clover.ecn.purdue.edu> To: alnl@cs.ualberta.ca Subject: interesting error trend with atree Howdy, I recently wrote that I was using atree to predict a physical property using image features. After running a series of trials in which I gradually increased "min correct" parameter in 'lf' and plotting prediction error over the range of the physical property, I noticed a puzzling but interesting trend. [I'm going to take a little bit to explain this, so bear with me. I've found that if you ever ask netland for help, you had better be as descriptive as possible.] Generally, the function mapping resulted in over-prediction for low values and under-prediction for high values (which is probably due to the features that I am using and the distribution of the data set). Try to visualize a graph of prediction error and the actual values. If a regression line is plotted, it has a negative slope. Now, for a relatively low value of "min correct," many of the points in the prediction error graph lie almost on a straight line! Bothersomely so (if that's a word). As I gradually increased "min correct," this predictability of the error went away. For the best case (lowest mean prediction error), there's not really any trend. In other words, fitting (predicted-actual) = m(actual) + b resulted in a higher R-square for the lower values of "min correct" and lower R-square for higher values (from 0.6 to 0.2). The slope (m) decreased with increasing "min correct." (I know this may be hard to visualize - I can send a 30KB PostScript graph if you are genuinely interested.) BTW, the average difference between predicted and actual decreased with increasing "min correct" (as you would expect). If you've got this picture in your head, you can see why this is disturbing---the average error is higher but *much* more predictable when the network is required to learn fewer patterns. My theory is that (1) I am not using enough (or the right) features to characterize low/high values of the physical property and (2) the decreasing predictability of the error trend with more training patterns (recall R-square decreases from 0.6 to 0.2) may indicate some form of over-training (too many patterns, not too many iterations). I could have a "whole heap of" outliers, but previous studies with this type of data indicates that high levels of variability/ unpredictability. The straight line in the graph I described for lower "min correct" may indicate that, with careful study, I could just post-process the data to correct the error. Well, I thought that this was interesting. If you have any comments, please send them. Darrell From alnl-owner Mon Mar 30 07:48:13 1992 Received: from clvax1.cl.msu.edu ([35.8.2.1]) by scapa.cs.ualberta.ca with SMTP id <42434>; Mon, 30 Mar 1992 07:27:02 -0700 Date: Mon, 30 Mar 1992 07:26:00 -0700 From: "DOUG PIERCE CPS115" Subject: RE: interesting error trend with atree To: "mccauley" cc: "alnl" Message-Id: <92Mar30.072702mst.42434@scapa.cs.ualberta.ca> ================================================================== From: mccauley@ecn.purdue.edu (Darrell McCauley) Message-Id: <9203271629.AA05643@clover.ecn.purdue.edu> To: alnl@cs.ualberta.ca Subject: interesting error trend with atree Howdy, I recently wrote that I was using atree to predict a physical property using image features. After running a series of trials in which I gradually increased "min correct" parameter in 'lf' and plotting prediction error over the range of the physical property, I noticed a puzzling but interesting trend. etc......... Darrell ================================================================ This is very interesting, but I am a bit confused about what is changing and what is constant. You say: If you've got this picture in your head, you can see why this is disturbing---the average error is higher but *much* more predictable when the network is required to learn fewer patterns. Before I got to this point, I thought the error trend was dependent on your increasing the "min correct" parameter, *not* the number of patterns. Which is it? If the number of patterns enters in here, I would guess that for small numbers, the net is memorizing the individual patterns, and thus the "error" terms are systematic, not random. I would also like to know how you processed the images (for other reasons entirely). Did you digitize photographs, and if so, how did you transform a grey-scale image? Did you end up with a grid of 0's and 1's (black/white), or did you use grey-scale values as input to the ALN? What was the resolution (X*Y)? Did you have to reduce the size of the inputs to cope with processing time? The reason I ask is that I am doing a project on recognition of facial features, and will have to deal with these issues very soon. Thanks for any ideas. I am hoping that ALN's will be much faster than other net types for this application. Doug From alnl-owner Mon Mar 30 11:45:10 1992 Received: from brier.ecn.purdue.edu ([128.46.161.175]) by scapa.cs.ualberta.ca with SMTP id <42143>; Mon, 30 Mar 1992 11:27:16 -0700 Received: by brier.ecn.purdue.edu (5.65/1.30jrs) id AA09272; Mon, 30 Mar 92 13:26:43 -0500 Date: Mon, 30 Mar 1992 11:26:43 -0700 From: mccauley@ecn.purdue.edu (Darrell McCauley) Message-Id: <9203301826.AA09272@brier.ecn.purdue.edu> To: alnl@cs.ualberta.ca In-Reply-To: "DOUG PIERCE CPS115"'s message of Mon, 30 Mar 1992 07:26:00 -0700 <92Mar30.072702mst.42434@scapa.cs.ualberta.ca> Subject: interesting error trend with atree > This is very interesting, but I am a bit confused about what is > changing and what is constant. You say: > > If you've got this picture in your head, you can see why this > is disturbing---the average error is higher but *much* more > predictable when the network is required to learn fewer patterns. > > Before I got to this point, I thought the error trend was > dependent on your increasing the "min correct" parameter, *not* > the number of patterns. Which is it? yes? Is not "min correct" the number of patterns that it is *required* to learn? (i.e., adapt until you have learned "min correct" or reach "max epochs," then predict for test set and exit). To clarify, the number of patterns in the training set never changed, only the "min correct" number. > If the number of patterns enters in here, I would guess that for > small numbers, the net is memorizing the individual patterns, and > thus the "error" terms are systematic, not random. This could be, but does anyone have a clue why I saw the particular pattern? From alnl-owner Mon May 4 19:42:50 1992 Received: from thistle.ecn.purdue.edu ([128.46.161.178]) by scapa.cs.ualberta.ca with SMTP id <42552>; Mon, 4 May 1992 19:28:37 +0100 Received: by thistle.ecn.purdue.edu (5.65/1.30jrs) id AA25023; Mon, 4 May 92 20:28:07 -0500 Date: Mon, 4 May 1992 19:28:07 +0100 From: mccauley@ecn.purdue.edu (Darrell McCauley) Message-Id: <9205050128.AA25023@thistle.ecn.purdue.edu> To: alnl@cs.ualberta.ca Subject: aln application paper available The following paper (11 pages in length) is available via anonymous ftp from pasture.ecn.purdue.edu:pub/mccauley/mccauley.beef.ps.Z and archive.cis.ohio-state.edu:pub/neuroprose/mccauley.beef.ps.Z A shorter version was submitted to Transactions of the ASAE. Any comments or questions should be sent to mccauley@ecn.purdue.edu. Though I cannot mail hardcopies, I may be willing to e-mail compressed, uuencoded PostScript versions. ------------------------------------------------------------------------- FAT ESTIMATION IN BEEF ULTRASOUND IMAGES USING TEXTURE AND ADAPTIVE LOGIC NETWORKS James Darrell McCauley, USDA Fellow Brian R. Thane, Graduate Student Dept of Agricultural Engineering Dept of Agricultural Engineering Purdue University Texas A&M University (mccauley@ecn.purdue.edu) (thane@diamond.tamu.edu) A. Dale Whittaker, Assistant Professor Dept of Agricultural Engineering Texas A&M University (dale@diamond.tamu.edu) Overviews of Adaptive Logic Networks and co--occurrence image texture are presented, along with a brief synopsis of instrument grading of beef. These tools are used for both prediction and classification of intramuscular fat in beef from ultrasonic images of both live beef animals and slaughtered carcasses. Results showed that Adaptive Logic Networks perform better than any fat prediction method for beef ultrasound images to date and are a viable alternative to statistical techniques. \keywords{Meat, Grading, Automation, Ultrasound Images, Neural Networks.} -- James Darrell McCauley Department of Ag Engr, Purdue Univ mccauley@ecn.purdue.edu West Lafayette, Indiana 47907-1146 ** "Do what is important first, then what is urgent." (unknown) ** From alnl-owner Mon May 11 00:48:34 1992 Received: by scapa.cs.ualberta.ca id <42481>; Mon, 11 May 1992 00:37:39 +0100 From: Bill Armstrong To: alnl Subject: new version for Windows Message-Id: <92May11.003739mdt.42481@scapa.cs.ualberta.ca> Date: Mon, 11 May 1992 00:37:20 +0100 There is a new version of atree for Windows on the PC, atree release 2.5, available via ftp from menaik.cs.ualberta.ca [129.128.4.241] in directory pub. If you just want to try running it, then you can get the file a25exe.exe. This is self extracting. Copy it into your main directory on a PC and run it. It creates a directory atree_25 which contains executables. This is a much enhanced version, and you will enjoy the demonstration of noise immunity shown by the ocr demo. Try running ocr.exe and see how your eyes compete with the ALNs in distinguishing the characters in the presence of noise! If you intend to do some programming, you can get the full source in atre25.exe, which also contains everything the shorter version does. Both files are also available on CompuServe in forums WINADV and AIEXPERT. ALNs have had great successes lately in several areas: beef grading, classifying particles at an accelerator facility and designing walking prostheses for spinal-cord damaged patients. To date we have distributed over 1400 direct copies of the atree release 2.0 software on Internet and over 800 on CompuServe. ALNs will be taught in some university courses as an alternative approach to backprop. The Unix version is still release 2.0, but this will be upgraded soon to include some of the new features of release 2.5 under Windows. Good luck with release 2.5! Bill PS atree release 2.5 was done by Monroe Thomas, who deferred taking on a steady job so he could work on it and on a commercial version that may be ready in the Fall. Thanks Monroe! From alnl-owner Wed Jun 17 14:28:31 1992 Received: from mail.uunet.ca ([142.77.1.1]) by scapa.cs.ualberta.ca with SMTP id <42453>; Wed, 17 Jun 1992 14:15:27 +0100 Received: from mitel by mail.uunet.ca with UUCP id <10018>; Wed, 17 Jun 1992 16:14:30 -0400 Received: from Software.Mitel.COM (spock) by Mitel.COM (4.1/SMI-4.1) id AA04031; Wed, 17 Jun 92 12:26:09 EDT Received: from msn506 by Software.Mitel.COM (4.0/SMI-4.0) id AA28176; Wed, 17 Jun 92 12:26:08 EDT Date: Wed, 17 Jun 1992 10:26:08 +0100 From: Gary.Murphy@Software.Mitel.COM (Gary Murphy) Message-Id: <9206171626.AA28176@Software.Mitel.COM> To: alnl@cs.ualberta.ca Subject: AI.CD-ROM From alnl-owner Sat Jul 4 00:28:12 1992 Received: by scapa.cs.ualberta.ca id <42429>; Sat, 4 Jul 1992 00:13:48 +0100 From: Bill Armstrong To: alnl Subject: atree release 2.6 is available now Message-Id: <92Jul4.001348mdt.42429@scapa.cs.ualberta.ca> Date: Sat, 4 Jul 1992 00:13:41 +0100 Dear alnl subscriber: The adaptive logic network (ALN) simulation package atree release 2.6 is available on menaik. It runs on IBM PCs and compatibles under Windows 3.x. Included is documentation and ON-LINE HELP that will help you understand the basic principles of adaptive logic networks and enable you to try out some examples of your own creation. This package is not a toy. Experimenters are using it on challenging problems of medicine, physics and the environment. It is possible to use off-the shelf programmable logic devices to realize the results of training these logic networks in very high-speed hardware, though we do not provide those facilities here. Please read the license, which among other things allows only non-commercial use. A commercial version is planned for later release. Atree release 2.6 is available in either of two files in the pub/ directory: atre26.exe and a26exe.exe. The file atre26.exe contains the full C and C++ sources for those who want to program some applications using Borland C++ 3.x and Application Frameworks, or some similar C/C++ development environment. The other, smaller file contains just the executables for those who just want to try out atree release 2.6. It is recommended that you execute atre26.exe in your main directory, whereupon it will create a subdirectory atree_26 and extract everything into it. Running "setup" in that directory will create a group of icons you can use to invoke demos and the facilities for programming adaptive logic network applications in the lf language. The "Open" command gives you access to numerous instructive examples. The new OCR demo in release 2.6 allows the user to create his own character shapes. Rotations and translations of the characters and a large amount of "salt and pepper" noise can be overcome by the adaptive logic networks. We think you will be impressed. The Unix version, atree release 2.0, will be upgraded soon to add some of the functionality of release 2.6. It is not yet capable of the nice color displays of the Windows version, but besides running on Unix workstations it can be easily ported to Macintosh, Amiga, and other machines. Thanks to all those who have sent comments about their work on ALNs, and have helped us to develop new concepts at a rapid rate. We would appreciate your continued help in finding any problems in release 2.6. Thanks to all of you on the alnl mailing list. I hope the list will become more active now that real results are starting to appear in high energy physics, medicine, agriculture etc. Bill From alnl-owner Thu Jul 16 14:36:27 1992 Received: by scapa.cs.ualberta.ca id <42906>; Thu, 16 Jul 1992 14:18:13 +0100 Subject: Problems with atree 2.0 and 2.6 From: Bill Armstrong To: alnl Date: Thu, 16 Jul 1992 14:18:02 +0100 X-Mailer: ELM [version 2.3 PL11] Message-Id: <92Jul16.141813mdt.42906@scapa.cs.ualberta.ca> Shyam Kamadolli pointed out some problems with atree 2.0 and 2.6: > 1) atree 2.0 on UNIX > My problem is that the histogram generating script complains about the > line being too long after I run the lf routine on my training and test sets. Rolf Manderscheid suggest trying gawk (gnu-awk) instead of awk if you have it. Let me know what happens please. ---------------------------------- I did try gawk.. it WORKS. Thanks again. 2) atree 2.6 In lfBatch, if you try to change directories, the application dies. This is a bug, which will be corrected soon. If others have found any problems, please let us know asap. Thanks. Bill From alnl-owner Fri Jul 24 14:58:00 1992 Received: by scapa.cs.ualberta.ca id <42376>; Fri, 24 Jul 1992 14:20:09 +0100 From: Bill Armstrong To: alnl Message-Id: <92Jul24.142009mdt.42376@scapa.cs.ualberta.ca> Date: Fri, 24 Jul 1992 14:19:56 +0100 For those who would like to understand the ALN learning algorithm used in the atree demo software, here is a little writeup that should help. The best published introduction to ALN learning is the paper: W. Armstrong and J. Gecsei, "Adaptation Algorithms for Binary Tree Networks", IEEE Trans. on Systems, Man and Cybernetics, 9, 1979, pp. 276-285. The only significant changes in atree 2.0 and 2.6 from that paper are that the heuristic responsibility is now a combination of true responsibility, error responsibility, and responsibility of both children of LEFT and RIGHT nodes. The details are all in the "adapt" routine in atree.c shown in part below. private void adapt(tree,vec,desired) atree *tree; bit_vec *vec; bool_t desired; { register int lres; register int rres; register int funct; /* * INCR and DECR implement bounded counters. */ #define INCR(t) ((t) += ((t) < MAXSET)) #define DECR(t) ((t) -= ((t) > MINSET)) funct = tree -> c_tag; The c_tag is 0 for an AND node and 1 for an OR node. LEFT and RIGHT nodes have tags 2 and 3. Which is which doesn't matter here. The node we are adapting is the root of a recursively defined "tree" datatype. First the node has to have its inputs evaluated if that hasn't already been done. ( Of course, we don't adapt leaves.) if (funct != LEAF) { /* If the left child is unevaluated, evaluate it */ if ((lres = tree -> n_sig_left) == UNEVALUATED) { lres = atree_eval(tree -> n_child[LEFT_CHILD], vec); tree -> n_sig_left = lres; } Now n_sig_left stores the result of evaluation on the left. /* If the right child is unevaluated, evaluate it */ ... Same thing on the other side. /* Update the counters if needed */ There are two counters, one associated with the (left,right)=(0,1) input pair and one with the (1,0) input pair. The appropriate counter is counted up or down depending on the desired network output. ( Yes, the desired *network* output -- because all nodes are monotonic increasing functions. There is thus a positive correlation between the net output and any element's output.) if (lres != rres) If there is a (0,0) or (1,1) input pair to the node, no adaptation occurs, according to the above condition. { if (lres) { (void) (desired ? INCR(tree -> n_cnt_left) : DECR(tree -> n_cnt_left)); } else { (void) (desired ? INCR(tree -> n_cnt_right) : DECR(tree -> n_cnt_right)); } /* has the node function changed? */ The values of the two counters determine the function of the node; e.g if they are both above (below) their midpoints, it is an OR (AND) node. If they are on different sides of the midpoints, the node is a LEFT or RIGHT. tree -> c_tag = node_function(tree); funct = tree -> c_tag; } /* Assign responsibility to the left child */ It's symmetric to the resp of the right child discussed below, so we don't need to discuss it. /* Assign responsibility to the right child */ if (lres != desired || funct != (lres ? OR : AND)) { adapt(tree -> n_child[RIGHT_CHILD], vec, desired); } This finishes the algorithm, except for the one line that must be understood to grasp the entire atree credit-assignment algorithm, and that is the condition in the last if-statement: lres != desired || funct != (lres ? OR : AND) The current node is heuristically responsible when we get to this if statement, and the question is which children should be h. resp. If the left input is not equal to the desired network output (For a h. resp. node lres != desired is called an "error" on the left), the right child is made responsible. The rationale for this is that right child is made responsible if there is an error on the left, since maybe the left child can't correct the error (maybe it's a leaf). If the node function is not OR or AND (i.e. it is LEFT or RIGHT with funct == 2 or 3, which are never values of the expression (lres ? OR : AND)), then the right child is made h. resp. This says: try to put the subtrees that are cut off by a LEFT or RIGHT to some useful work. If such a cut-off subtree gets good enough, the node function might change to AND or OR to give that subtree an effective role in the net. Finally, if the left input is 0 for an OR node or 1 for an AND node (i.e. if the right signal rres is enabled to pass through), then the right child is heuristically responsible. This last part of the credit assignment by h. resp., if implemented by itself, is true responsibility for the AND and OR nodes: a node is truly responsible if a change in its output would change the network's output. This component of heuristic responsibility is just hill-climbing or gradient descent. The two earlier components of h. resp. go beyond hill climbing to give the learning algorithm much more power than hill climbing alone. Note that the left input controls the right child's responsibility, and vice-versa. This implies that every node in the tree can interact with every other node at every pattern presentation. There is one other trick: if the network output is still wrong after the first adaptation, adapt again. This compensates for a shortage of errors to correct as the end of adaptation approaches. (The code above was coded by Rolf Manderscheid, to whom should go the glory or blame for putting a set of nested conditions into one line.)