In this paper we consider algorithms for active learning which select data in an attempt to minimize the value of Equation 4, integrated over X. Intuitively, the minimization proceeds as follows: we assume that we have an estimate of , the variance of the learner at x. If, for some new input , we knew the conditional distribution , we could compute an estimate of the learner's new variance at x given an additional example at . While the true distribution is unknown, many learning architectures let us approximate it by giving us estimates of its mean and variance. Using the estimated distribution of , we can estimate , the expected variance of the learner after querying at .
Given the estimate of , which applies to a given x and a given query , we must integrate x over the input distribution to compute the integrated average variance of the learner. In practice, we will compute a Monte Carlo approximation of this integral, evaluating at a number of reference points drawn according to . By querying an that minimizes the average expected variance over the reference points, we have a solid statistical basis for choosing new examples.