We consider three different ways in which TILDE can be run in its ILPROLOG implementation:
The most interesting information is obtained by comparing (a) the actual query evaluation time in settings 2 and 3: this gives a view of the efficiency gain obtained by the removal of redundant computation itself (we will abbreviate this as exec in the tables); and (b) the total execution time in settings 1 and 3: this provides an indication of how much is gained by implementing packs in an ILP system, taking all other effects into account (re-implementation of the computation of heuristics via a bit matrix, use of compiled queries instead of meta-calls, etc.), or in other words: what the net effect of the whole re-implementation is (indicated as net in the tables).
In a first experiment we used Bongard problems, varying (1) the size of the data sets; (2) the complexity of the target hypothesis; and (3) TILDE's lookahead parameter. The complexity of the target hypothesis can be small, medium, or none. In the latter case the examples are random, which causes TILDE to grow ever larger trees in an attempt to find a good hypothesis; the size of the final tree then typically depends on the size of the data set. The lookahead parameter is used to control the number of levels the pack contains; with lookahead , packs of depth are generated.
Table 1 gives an overview of results for the Bongard problems. The total induction time is reported, as well as (for pack-based execution mechanisms) the time needed for pack compilation and pack execution. Note that the total time includes not only pack compilation and execution, but also all other computations not directly related to packs (e.g., the computation of heuristics from the bitmatrix). The results can be interpreted as follows.
First of all, the table shows that significant speedups can be obtained by using the pack mechanism; net speedups of over a factor 5.5 are obtained, while the execution itself is up to 75 times faster compared to disjoint execution.
A further observation is that for more complex target hypotheses greater speedups are obtained. This can be explained by the broom-like form of the packs in TILDE. Complex target hypotheses correspond to deep trees, and refinement of a node at a lower level of such a tree yields a pack with a long clause before the branching, which in accordance with our previous analysis should yield a speedup closer to the branching factor in the case of lookahead 0 (and more generally, closer to for lookahead , although the latter is much harder to achieve). Note that the maximum branching factor occurring in each pack is included in the table in column .
Finally, deeper packs also yield higher speedups, and this effect is larger for more complex theories. This is understandable considering the following. Let us call the clause that is being refined . With lookahead , conjunctions of literals are added to the clause. In some cases the first of these literals may fail immediately, which causes this branch of the pack to have almost no execution time, while cutting away queries. Remember that according to our analysis, the speedup can in the limit approximate if the complexity of clause dominates over the complexity of the rest of the pack; such ``early failing branches'' in the pack cause the actual situation to approximate closer this ideal case.
We have also run experiments on the Mutagenesis data set (Table 2), both in a regression and a classification setting. Here, query packs are much larger than for the Bongard data set (there is a higher branching factor); with a lookahead of 2 the largest packs had over 20000 queries. For these large packs a significant amount of time is spent compiling the pack, but even then clear net speedups are obtained.5 A comparison of execution times turned out infeasible because in the disjoint execution setting the pack structures consumed too much memory.
|