Next: Proof of Theorem 3
Up: Appendix A
Previous: Appendix A
Since the hardness
results of Theorems 1 and 3 are stated for the two-classes case, we shall use the notation
for some arbitrary rule
, where
is the value for class
``-'' and
is the value for class ``+''. A positive value for
means that
is in favor of class ``+'' whereas a negative value
gives a
in favor of class ``-''. Value
for
gives a
neutral with respect
to the classes. We use a reduction from the
-Hard problem ``Minimum Cover'' [Garey Johnson1979]:
- Name : ``Minimum Cover''.
- Instance : A collection
of subsets of a finite set
. A positive integer
,
.
- Question : Does
contain a cover of size at most
, that is, a subset
with
, such that any element of
belongs to at least one member of
?
The reduction is constructed as follows : from a ``Minimum Cover'' instance we build a learning sample
such that if there exists a cover of size
of
, then there exists a decision committee with
literals consistent with
, and, reciprocally, if there exists a decision committee with
literals consistent with
, then there exists a cover of size
of
. Hence, finding the smallest decision committee consistent with
is equivalent to finding the smallest
for which there exists a solution to ``Minimum Cover'', and this is intractable if
.
Let
denote the
element of
, and
the
element
of
. We define a set of
Boolean variables in one to one correspondence with the elements of
, which we
use to describe the examples of
. The corresponding set of literals is
denoted
. The sample
contains two
disjoint subsets : the set of positive examples
, and the set of
negative ones
.
contains
examples, denoted by
. We construct each positive example so that it encodes
the membership of the corresponding element of
in the elements of
. More precisely,
contains a single negative example, defined by:
Suppose there exists a cover
of
satisfying
. We create a decision committee consisting of
monomials, each
with one literal only and associated to a positive
. Each monomial
codes one of the sets in
. The default class is ``-''. This decision committee is consistent with the examples of
, otherwise some element of
would not be covered. If there are only two values authorized for the vectors and they are
, we simply create a DC consisting of one monomial with negative literals associated to a negative
(the value for the negative class is greater than the one of the positive class); each of the negative literals codes one of the sets in
. The default class is ``+''.
Suppose now that there exists a decision committee
with at most
literals consistent with
. Denote
each monomial of
, in no specific order, and
their associated values for
. The monomials of
can belong to three types of subsets of monomials:
- monotonous monomials (without negative literals),
- monomials containing only negative literals,
- monomials containing positive and negative literals.
Let us call respectively
,
,
these three classes. Given that each monomial of
can be associated to a positive or a negative
, there exists on the whole six classes of rules, presented in Figure 5.
Figure 5:
The six possible cases of rules.
 |
Any monomial of
containing at least one positive literal can only be satisfied by positive examples. Therefore, if there exists rules belonging to class II or VI, we can remove them without losing consistency. Furthermore, since
contains only negative literals, if we remove their negative literals from all rules belonging to class V (making them go to class I), we do not lose consistency. As a consequence, we can suppose without loss of generality that all rules of
are in class I, III, or IV.
We now treat independently two cases, depending on whether the default class of
is ``+'' or ``-''.
- The default class is ``-''. Any positive example satisfies therefore a monomial in
. There can exist two types of positive examples: those satisfying at least one rule of class I, and those not satisfying any class I rule (therefore satisfying at least one rule of class III).
satisfies all class III and IV rules. Therefore,
 |
|
|
(11) |
This shows that, if a positive example not satisfying any class I rules would satisfy all class IV rules, then it would be misclassified, which is impossible by the consistency hypothesis. This gives an important property, namely that any positive example not satisfying any class I rule cannot satisfy all class IV rules. Let us call P this property in what follows. We now show how to build a valid solution to ``Minimum Cover'' with at most
elements. For any positive example
,
- if
satisfies at least one class I rule, choose in
a subset of
corresponding to a positive literal of some satisfied class I rule. This subset contains
.
- if
does not satisfy any class I rule, there exists from P some class IV rule which is not satisfied. Among all negative literals of a class IV rule which is not satisfied by
, choose one which is positive in
(causing it not to satisfy the rule), and then choose the corresponding element of
. This subset of
contains
.
Iterating the above procedure for all positive examples, we obtain a cover of
consisting of at most
subsets of
.
- The default class is ``+''.
satisfies all class III and IV rules. Therefore,
 |
|
|
(12) |
Even if the inequality is now strict, it gives the same procedure for efficiently building the solution to ``Minimum Cover'' with at most
elements, by using the same argument as in the preceeding case.
This ends the proof of Theorem 1.
Next: Proof of Theorem 3
Up: Appendix A
Previous: Appendix A
©2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.