There are a number of ways one can set k, the smoothing parameter. The method used by Cleveland et al. [1988] is to set k such that the reference point being predicted has a predetermined amount of support, that is, k is set so that n is close to some target value. This has the disadvantage of requiring assumptions about the noise and smoothness of the function being learned. Another technique, used by Schaal and Atkeson [1994] sets k to minimize the crossvalidated error on the training set. A disadvantage of this technique is that it assumes the distribution of the training set is representative of , which it may not be in an active learning situation. A third method, also described by Schaal and Atkeson [1994], is to set k so as to minimize the estimate of at the reference points. As k decreases, the regression becomes more global. The total weight n will increase (which decreases ), but so will the conditional variance (which increases ). At some value of k, these two quantities will balance to produce a minimum estimated variance (see Figure 3). This estimate can be computed for arbitrary reference points in the domain, and the user has the option of using either a different k for each reference point or a single global k that minimizes the average over all reference points. Empirically, we found that the variance-based method gave the best performance.