Interpretability Issues

Next: Noise Handling Up: Experiments Previous: Tradeoff Simplicity-Accuracy

Interpretability Issues

**Figure 1:** A DC obtained on the XD6 domain with WIDC(p). The first three rules exactly encode the target concept, and the irrelevant variable is absent from the DC.
$\begin{figure}\begin{center} \begin{tabular}{\vert l\vert\vert c\vert c\vert\ve... ...default $\vec{D}$& 0.963 & 0.037\\ \hline \end{tabular}\end{center}\end{figure}$

**Figure 2:** Part of a DT obtained on the XD6 domain with C4.5. Positive literals label the internal nodes. To classify an observation, the left edge of a node is followed when an observation contains (``*Yes*'') the positive literal, and the right edge is followed otherwise (*i.e.* the literal is negative in the observation). The bold square is used to display the presence of the irrelevant variable in the tree. A naive conversion of this tree in rules for both classes generates 30 rules, for a total of 179 literals.
$\begin{figure}\centerline{\epsfig{file=tree.ps,width=21cm}}\end{figure}$

In the XD6 domain, each example has 10 binary variables. The tenth is irrelevant in the strongest sense [John et al.1994]. The target concept

is a 3-DNF (a DNF with each monomial containing at most three literals) over the first nine variables: $(x_0 \wedge x_1 \wedge x_2) \vee (x_3 \wedge x_4 \wedge x_5) \vee (x_6 \wedge x_7 \wedge x_8)$ . Such a formula is typically hard to encode using a small decision tree. In our experiments with WIDC(o) and WIDC(p), we have remarked that the target formula itself is almost always an element of the classifier built, and the irrelevant attribute is always absent. Figure 1 shows an example of DC which was obtained on a run of WIDC. Note that the concept returned is a 3-DC. Figure 2 depicts a part of a tree obtained on this domain with C4.5. While the tree appears to be quite large for the domain, note the presence of the irrelevant variable in the tree, which it contributes to enlarge while making it harder to mine. On many other domains, we observed persistent rules or subconcepts through the 10 cross-validation runs. Similarly to XD6, whenever we could mine the results with a sufficiently accurate knowledge of the domain, these patterns were most interesting. For example, the DCs obtained on the LEDeven domain contained most of the time a combination of two rules with one literal each, which represented a very accurate way to classify 9 out of the 10 possible classes. On the Vote0 and Vote1 domains, we also observed constant patterns, some of which are well known [Blake et al.1998] to provide a very accurate classification for a tiny size. Even for Vote1 where classical studies often report errors over $12\%$ , and almost never around $10\%$ [Holte1993], we observed on most of the runs a DC containing an accurate rule with two literals only, with which WIDC(p) provided on average an error under $10\%$ .
WIDC was also compared to C4.5 on a real world domain on which mining issues are as crucial as classification strength: agriculture. An experiment is being carried out in Martinique by the DDAF (Departmental Direction of Agriculture and Forest), to achieve better understanding of the behavior of farmers, in particular regarding their willingness to contract a CTE (Farming Territorial Contract). Usual farming contracts with either the state (France) or Europe did not contain commitments for the farmer to satisfy. In a CTE, each farmer commits to adapt and/or change his agricultural techniques or productions, to ensure sustainable development for local agriculture. In exchange for this, he receives the guarantee to obtain financial help for this contract, and to be trained to new agricultural techniques. Such a domain is a good test bed to evaluate a method on the basis of predictability and interpretability, because of the place of uncertainty in agriculture, and the fact that obtaining data can be a hard and long task : the DDAF has to be as accurate as possible in its predictions and interpretations, to manage as best as possible its relationships with farmers, and in the case of CTEs, to make the best promotion campaign for these new contracts. Agriculture is also very sensitive to a ``showcase effect'': provided even few representative farmers will have subscribed to the contracts, comparatively many others are likely to follow.
In this study, from the description of 52 variables for about 60 representative farmers satisfying the criteria to adhere to a CTE, the aim is to develop models for those who are actually willing to adhere, those not willing to adhere, and those currently uncertain. Variables are data on each agricultural exploitation (size, terrain nature, financial data, type of production, etc .), as well as more personal data on the farmers (education, family status, objectives, personal answer to a questionnaire, etc.). This represents a small dataset to mine, but, interestingly, the results obtained were different when processing it with C4.5 or WIDC(p).
We ran both algorithms in a 10-fold stratified cross-validation experiment. WIDC(p) obtained a $2.8\%$ average error. In 6 out of 10 runs, the same DC was induced. It is presented in Figure 3. Basically, this DC proves that predicting the `` $\neg$ adhere'' class is the easiest task, followed by the prediction of the ``adhere'' class. The ``?'' (uncertain farmers) is predicted only by the default vector. This seems rather natural: whereas the extreme behaviors tend to be clear to determine, the uncertainty is the hardest to predict.
C4.5 (default parameters) induced a DT which was almost the exact transcription of rule 1, a rule which says that farmers with no education (without any agricultural diploma or traineeships) and no ongoing project are not willing to adhere. This rule is mostly interesting because it proves that education is a strong factor determining the `` $\neg$ adhere'' answer. The DTs induced also contained one or two more literals separating the ``adhere'' and ``?'' classes (average error: $6.7\%$ ), but only few other things could be mined from the trees of C4.5, in the light of the problem addressed.
Rule 2 in Figure 3 did not have the equivalent in the DTs induced. What it says is interesting for the DDAF, because it brings the following conclusion: farmers without ongoing projects, and not selling their products only to a wholesaler, are on the knife edge for their membership (either in ``adhere'', or in `` $\neg$ adhere''). Without going further into local agricultural considerations, this rule, for the DDAF Engineers, represents an accurate view of the farmers actually controlling their exploitation costs, being either for or against CTEs, and that education pushes towards the membership (combination of rules 1 and 2), probably because it allows them to see the future potential benefits of the contract, better than its current constraints.

**Figure 3:** The DC obtained on the agricultural data (see text for the interpretation of the variables).
$\begin{figure}\begin{center} \begin{tabular}{\vert l\vert\vert c\vert c\vert c\... ...ault $\vec{D}$& 0.32 & 0.68 & 0 \\ \hline \end{tabular}\end{center}\end{figure}$

Next: Noise Handling Up: Experiments Previous: Tradeoff Simplicity-Accuracy