The set of input attributes, for which we want to make a prediction about the resulting output attributes, is called the query, or query point. The first step in making a prediction with MBL is to look through the database to find all the data points whose input attributes are similar to the query point. In order to do that, we have to define what is meant by similar. We need to define a distance metric that tells how close two points are.
Vizier uses a scaled Euclidean distance metric ( norm). The distance
between two points (between their input attributes) is defined by:
where is a diagonal matrix and
refers to a vector of input
attributes. Other distance metrics include
norm (sometimes called
Manhattan distance),
norm, and Mahalanobis distance (same as
scaled Euclidean except that
is required to be symmetric, but not
necessarily diagonal). Scaled Euclidean distance works well for most cases
and we will not discuss the other metrics any further in this tutorial.