Blackbox Model Selection

Next: Searching specific sets of Up: Autonomous Modeling Previous: Cross Validation

Blackbox Model Selection

Vizier's Blackbox uses LOO-XVE to determine which is the best model for a particular data set. A summary of the full Blackbox algorithm is as follows:

Load the relevant data for the Cross Validation Expert.
Load the relevant data for the Cross Validation Police.
Generate a new metacode according to a search algorithm that is guided by the Cross Validation Expert.
Test the new metacode with cross validation and record the results.
While there is still more time left, repeat to step 3.
Select the metacode with the best evaluation according to the Cross Validation Police.

Frequently, the CV Expert (also referred to as the ``inner loop'' since it guides the internal searching) is LOO-XVE. It is used to guide the generation of new metacodes to be considered. At each step, the chosen search algorithm will generate several new metacodes as modifications of the best metacodes seen so far according to the CV Expert. Even though LOO-XVE is a great way to avoid overfitting, even it can suffer from it occasionally. If enough metacodes are tried, it may find one that just got lucky and produced low LOO-XVE even though it will do poorly on future predictions. In order to add an extra level of protection, Blackbox checks the CV Police (also called the ``outer loop'' since it controls the overall final selection) and finds the metacode that does best according to it as its final choice. The default setting for the CV Police is a random 35% of the original data set that is kept out of the data used by the CV Expert. This extra measure of protection against overfitting is usually enough to assure a good metacode is chosen, but we will see an example later where a different kind of CV Expert and CV Police should be used.

You may have noticed when you ran Blackbox on a1.mbl earlier, that the CV Police section of the dialogue box was set to file ``None.'' That indicates that there will be no separate CV Police evaluation. Blackbox sets it default that way because a1.mbl has so few data points. If a small data set is further subdivided, there won't be many points for the CV Expert to run on and there won't be enough for the CV Police to make a good check either.

The easiest thing to do when you get a new data file is just run Blackbox on it as we did on a1.mbl. However, Blackbox is quite a versatile tool and we will use the rest of the section to go over some of the options available. We'll also show some of the problems that can arise with automatic model selection via cross validation and what can be done to avoid them.

Next: Searching specific sets of Up: Autonomous Modeling Previous: Cross Validation

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997