Vizier's Blackbox uses LOO-XVE to determine which is the best model for a particular data set. A summary of the full Blackbox algorithm is as follows:
Frequently, the CV Expert (also referred to as the ``inner loop'' since it guides the internal searching) is LOO-XVE. It is used to guide the generation of new metacodes to be considered. At each step, the chosen search algorithm will generate several new metacodes as modifications of the best metacodes seen so far according to the CV Expert. Even though LOO-XVE is a great way to avoid overfitting, even it can suffer from it occasionally. If enough metacodes are tried, it may find one that just got lucky and produced low LOO-XVE even though it will do poorly on future predictions. In order to add an extra level of protection, Blackbox checks the CV Police (also called the ``outer loop'' since it controls the overall final selection) and finds the metacode that does best according to it as its final choice. The default setting for the CV Police is a random 35% of the original data set that is kept out of the data used by the CV Expert. This extra measure of protection against overfitting is usually enough to assure a good metacode is chosen, but we will see an example later where a different kind of CV Expert and CV Police should be used.
You may have noticed when you ran Blackbox on a1.mbl earlier, that the CV Police section of the dialogue box was set to file ``None.'' That indicates that there will be no separate CV Police evaluation. Blackbox sets it default that way because a1.mbl has so few data points. If a small data set is further subdivided, there won't be many points for the CV Expert to run on and there won't be enough for the CV Police to make a good check either.
The easiest thing to do when you get a new data file is just run Blackbox on it as we did on a1.mbl. However, Blackbox is quite a versatile tool and we will use the rest of the section to go over some of the options available. We'll also show some of the problems that can arise with automatic model selection via cross validation and what can be done to avoid them.