We applied our smoothing method described in section 4 to
both datasets in order to find out in how far the clustering of terms
improves the results of the FCA-based approach. As information measure we
use in this experiment the conditional probability as it
performs reasonably well as shown in Section 6.2.
In particular we used the following similarity measures: the cosine
measure, the Jaccard coefficient, the L1 norm as well as the
Jensen-Shannon and the Skew divergences (compare [37]).
Table 6 shows the impact of this smoothing technique
in terms of the number of object/attribute terms added to the dataset.
The Skew Divergence is excluded because it did not
yield any mutually similar terms.
It can be observed that smoothing by mutual similarity based on
the cosine measure produces the most previously unseen object/attribute pairs,
followed by the Jaccard, L1 and Jensen-Shannon divergence (in this order).
Table 7 shows the results for the different similarity
measures. The tables in appendix A list
the mutually similar terms for the different domains and similarity measures.
The results show that our smoothing technique actually yields worse results
on both domains and for all similarity measures used.
Table 6:
Impact of Smoothing Technique in terms of new object/attribute pairs
Baseline
Jaccard
Cosine
L1
JS
Tourism
525912
531041 (+ 5129)
534709 (+ 8797)
530695 (+ 4783)
528892 (+ 2980)
Finance
577607
599691 (+ 22084)
634954 (+ 57347)
584821 (+ 7214)
583526 (+ 5919)
Table 7:
Results of Smoothing in terms of F-Measure F'