Next: Building Small Accurate Decision
Up: Inducing Interpretable Voting Classifiers
Previous: Our Contribution
Let be the number of classes. Unless otherwise specified, an example is a couple
where is an observation described over variables, and its
corresponding class among
; to each example
is associated a weight , representing its appearance probability
with respect to a learning sample which we dispose of. is itself a subset of a whole domain which
we denote . Obviously, we do not have entire access to (
) : in general, we even have
( denotes the
cardinality; we suppose in all that follows that is discrete with
finite cardinality). In the particular
case where , the two classes are noted ``-'' () and ``+'' (), and called
respectively the negative and positive class. The learning sample is the union
of two samples, noted and , containing respectively the negative
and positive examples. It is worthwhile to think the positive examples as
belonging to a subset of containing all possible positive examples,
usually called the target concept.
As part of our goal in machine learning, is the need to build a reliable
approximation to the true classification of the examples in , that is, a good approximation of the target concept, by
using only the examples in . Good approximations shall have a high accuracy over , although we do not have access to this quantity,
but rather to its estimator: a more or less reliable accuracy computable over . We refer
the reader to standard machine learning books [Mitchell1997] for further
considerations about this issue. A DC contains two parts:
- A set of unordered pairs (or rules)
where each
is a monomial (a conjunction of literals) over
( being the number of
description variables, each is a positive literal and each
is a negative literal), and each is a vector in
. For the sake of readability, this vectorial notation shall be kept throughout all the paper,
even for problems with only two classes. One might choose to add a single
real rather than a 2-component vector in that case.
- A Default Vector in . Again, in the two-class case, it is sufficient to replace by a default class in .
For any observation and any monomial , the proposition `` satisfies '' is denoted by
. The opposite proposition `` does not satisfy '' is denoted by ``
''. The classification of any observation is made in the following way: define as follows
The class assigned to is then:
In other words, if the maximal component of is unique, then the
index gives the class assigned to . Otherwise, we take the index of the
maximal component of corresponding to the maximal component of
(ties are solved by a random choice among the maximal components).
DC contains a subclass which is among the largest classes of Boolean formulas to be PAC-learnable [Nock Gascuel1995], however
this class is less interesting from a practical viewpoint since rules can be
numerous and hard to interpret. Nevertheless, a subclass of DC
[Nock Gascuel1995] presents an interesting compromise between representational power
and interpretability power. In this class, which is used by WIDC, each of the
vector components are restricted to and each monomial is present
at most once. The values , , allow natural interpretations of the rules,
being either in favor of the corresponding class (), neutral with respect
to the class (), or in disfavor of the corresponding class (). This subclass, to which we relate as DC
, is, as we now prove, suffering the same algorithmic drawbacks as DT [Hyafil Rivest1976] and DL [Nock Jappy1998]: even without restricting the components of the vectors, or with any restriction to a set containing at least one real value, the construction of small formulas with sufficiently high accuracy is hard. This is a clear motivation for using heuristics in decision committee's induction.
Next: Building Small Accurate Decision
Up: Inducing Interpretable Voting Classifiers
Previous: Our Contribution
©2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.