next up previous
Next: Discussion Up: Results Previous: Information Measures


Smoothing

We applied our smoothing method described in section 4 to both datasets in order to find out in how far the clustering of terms improves the results of the FCA-based approach. As information measure we use in this experiment the conditional probability as it performs reasonably well as shown in Section 6.2. In particular we used the following similarity measures: the cosine measure, the Jaccard coefficient, the L1 norm as well as the Jensen-Shannon and the Skew divergences (compare [37]). Table 6 shows the impact of this smoothing technique in terms of the number of object/attribute terms added to the dataset. The Skew Divergence is excluded because it did not yield any mutually similar terms. It can be observed that smoothing by mutual similarity based on the cosine measure produces the most previously unseen object/attribute pairs, followed by the Jaccard, L1 and Jensen-Shannon divergence (in this order). Table 7 shows the results for the different similarity measures. The tables in appendix A list the mutually similar terms for the different domains and similarity measures. The results show that our smoothing technique actually yields worse results on both domains and for all similarity measures used.

Table 6: Impact of Smoothing Technique in terms of new object/attribute pairs
  Baseline Jaccard Cosine L1 JS  
Tourism 525912 531041 (+ 5129) 534709 (+ 8797) 530695 (+ 4783) 528892 (+ 2980)  
Finance 577607 599691 (+ 22084) 634954 (+ 57347) 584821 (+ 7214) 583526 (+ 5919)  



Table 7: Results of Smoothing in terms of F-Measure F'
  Baseline Jaccard Cosine L1 JS  
Tourism 44.69% 39.54% 41.81% 41.59% 42.35%  
Finance 38.85% 38.63% 36.69% 38.48% 38.66%  



next up previous
Next: Discussion Up: Results Previous: Information Measures
Philipp Cimiano 2005-08-04