Now that we have this neural programming representation, we introduce a mechanism to accomplish internal reinforcement. In the Internal Reinforcement of Neural Programs (IRNP), there are two main stages. The first stage is to classify each node and arc of a program with its perceived contribution to the program's output. This set of labels will be collectively referred to as the Credit-Blame map. The second stage is to use this Credit-Blame map to change the program in a way that is more likely than random to improve its performance.
It is still an open question in our research which methods to use to best accomplish the goals of internal reinforcement. We have already identified several methods for accomplishing each of the two stages. For the sake of brevity and clarity, this paper will focus on only one technique for each of the two stages.
The evolved NP programs that we consider now are part of the PADO system (described in [12, 13, 14]). For the purpose of this paper, it is sufficient to know that PADO is a machine learning system designed for signal classification. In PADO, an evolving program need only learn to discriminate one signal from the others in order to survive and be useful to the PADO system. The experiments we report to illustrate NP and IRNP are run in PADO.