Next: Artificial Data
Up: A Database Query Application
Previous: A Larger Corpus
One component of the algorithm not yet evaluated explicitly is the
candidate generation method. As mentioned in
Section 4.1, we could use fractures of representations
of sentences in which a phrase appears to generate the candidate
meanings for that phrase, instead of LICS. We used this approach and
compared it to the previously described method of using the largest
isomorphic connected subgraphs of sampled pairs of representations as
candidate meanings. To attempt a more fair comparison, we also
sampled representations for fracturing, using the same number of
source representations as the number of pairs sampled for LICS.
The accuracy of CHILL when using the resulting learned lexicons as
background knowledge are shown in Figure 13. Using
fracturing (fractWOLFIE) shows little or no advantage; none of
the differences between the two systems are statistically significant.
Figure 13:
Fracturing vs. LICS: Accuracy
 |
In addition, the number of initial candidate lexicon entries from
which to choose is much larger for fracturing than our LICS
method, as shown in Figure 14. This is true even
though we sampled the same number of representations as pairs for
LICS, because there are a larger number of fractures for an arbitrary
representation than the number of LICS for an arbitrary pair.
Figure 14:
Fracturing vs. LICS: Number of Candidates
 |
Finally, WOLFIE's learning time when using fracturing is greater than that
when using LICS, as shown in Figure 15, where the CPU time
is shown in seconds.
Figure 15:
Fracturing vs. LICS: Learning Time
 |
In summary, these differences show the utility of LICS as a method for
generating candidates: a more thorough method does not result in
better performance, and also results in longer learning times. One
could claim that we are handicapping fracturing since we are only
sampling representations for fracturing. This may indeed help the
accuracy, but the learning time and the number of candidates would likely
suffer even further. In a domain with larger representations, the
differences in learning time would be even more dramatic.
Next: Artificial Data
Up: A Database Query Application
Previous: A Larger Corpus
Cindi Thompson
2003-01-02