Schapire Singer (1998) have investigated classification problems where the aim of the
procedure is not to provide an accurate class for some observation. Rather,
the algorithm outputs a set of values (one for each class) and we expect the
class of the observation to receive the largest value of all, thus being
ranked higher than all others. This approach is particularly useful when
a given example can belong to more than one class (multilabel problems), a case where we expect each of
these classes to receive the greatest values compared to the classes the
examples does not belong to.
The ranking loss represents informally the
number of times the hypothesis fails to rank the class of an
example higher than a class to which it does not belong. Before going further, we first generalize our classification setting, and replace the common notation for an example by the more general one
. Here,
is a vector giving, for each class, the membership to the class (``0'' is no and ``1'' is yes) of the corresponding observation
. It is important to note that this setting is more general than the usual Bayesian setting, in which there can exist examples
and
(using the non-vector notation) for which
but
. Ranking loss generalizes Bayes to the multilabel problems, and postulates that there can be some examples for which we cannot provide a single class at a time, even if e.g. any of the classes to which the example belongs are susceptible to appear independently later with the same observation.
Ranking loss Boosting replaces each example by a set of
examples, where
denotes the Hamming weight of
(i.e. the number of classes to which the example belongs). Each of these new examples is denoted
, where
and
span all values in
. The distribution of the new examples is renormalized, so that
whenever
and
, and
otherwise.
Take some monomial obtained from the large DC, and all examples satisfying it. We now work with this restricted subset of examples, while calculating the corresponding vector
of
. Schapire
Singer (1998) propose a cost function which we should minimize in order to minimize the ranking loss. This function is