Next: Results Up: Learning Concept Hierarchies from Previous: Text Processing

Evaluation

In order to evaluate our approach we need to assess how good the automatically learned ontologies reflect a given domain. One possibility would be to compute how many of the superconcept relations in the automatically learned ontology are correct. This is for example done by [33] or [8]. However, due to the fact that our approach, as well as many others (compare [34,46,30]), does not produce appropriate names for the abstract concepts generated, it seems difficult to assess the validity of a given superconcept relation. Another possibility is to compute how 'similar' the automatically learned concept hierarchy is with respect to a given hierarchy for the domain in question. Here the crucial question is how to define similarity between concept hierarchies. Though there is a great amount of work in the AI community on how to compute the similarity between trees [63,28], concept lattices [4], conceptual graphs [42,45] and (plain) graphs [11,64], it is not clear how these similarity measures also translate to concept hierarchies. An interesting work in these lines is the one presented by [40] in which ontologies are compared along different levels: semiotic, syntactic and pragmatic. In particular, the authors present measures to compare the lexical and taxonomic overlap between two ontologies. Furthermore, they also present an interesting study in which different subjects were asked to model a tourism ontology. The resulting ontologies are compared in terms of the defined similarity measures thus yielding the agreement of different subjects on the task of modeling an ontology. In order to formally define our evaluation measures, we introduce a core ontology model in line with the ontological model presented by [59]:

Definition 6 (Core Ontology)
A core ontology is a structure $O:=(C,root,\leq_C)$ consisting of (i) a set

of concept identifiers, (ii) a designated root element representing the top element of the (iii) partial order $\leq_C$ on $C \cup \{root\}$ such that $\forall c \in C c \leq root$ , called concept hierarchy or taxonomy.

For the sake of notational simplicity we adopt the following convention: given an ontology

, the corresponding set of concepts will be denoted by

and the partial order representing the concept hierarchy by $\leq_{C_i}$ . It is important to mention that in the approach presented here, terms are directly identified with concepts, i.e. we neglect the fact that terms can be polysemous.⁶ Now, the Lexical Recall (LR) of two ontologies

and

is measured as follows:⁷

$\begin{displaymath}LR(O_1,O_2)=\frac{\vert C_{1} \cap C_{2}\vert} {\vert C_{2}\vert}\end{displaymath}$

Take for example the concept hierarchies $O_{auto}$ and $O_{ref}$ depicted in Figure 4. In this example, the Lexical Recall is $LR(O_{auto},O_{ref})=\frac{5}{10}=50\%$ .

**Figure 4:** Example for an automatically acquired concept hierarchy $O_{auto}$ (left) compared to the reference concept hierarchy $O_{ref}$ (right)
$\begin{figure} \begin{center} \epsfig{file=example2.eps, width=6.3cm} \hspace{0.5cm} \epsfig{file=example.eps, width=5.2cm} \end{center} \end{figure}$

**Figure 5:** Example for a perfectly learned concept hierarchy $O_{perfect}$ (left) compared to the reference concept hierarchy $O_{ref}$ (right)
$\begin{figure} \begin{center} \epsfig{file=examplelearned.eps, width=5.5cm} \hspace{1cm} \epsfig{file=example.eps, width=5.5cm} \end{center} \end{figure}$

In order to compare the taxonomy of two ontologies, we use the Semantic Cotopy (SC) presented by [40]. The Semantic Cotopy of a concept is defined as the set of all its super- and subconcepts:

$\displaystyle SC(c_i,O_i):=\{c_j \in C_i \vert c_i \leq_C c_j \mbox{or} c_j \leq_C c_i\},$

In what follows we illustrate these and other definitions on the basis of several example concept hierarchies. Take for instance the concept hierarchies in Figure 5. We assume that the left concept hierarchy has been automatically learned with our FCA approach and that the concept hierarchy on the right is a handcrafted one. Further, it is important to point out that the left ontology is, in terms of the arrangement of the leave nodes and abstracting from the labels of the inner nodes, a perfectly learned concept hierarchy. This should thus be reflected by a maximum similarity between both ontologies. The Semantic Cotopy of the concept vehicle in the right ontology in Figure 5 is for example {car, bike, two-wheeled vehicle, vehicle, object-to-rent} and the Semantic Cotopy of driveable in the left ontology is {bike, car, rideable, driveable, rentable, bookable}.
It becomes thus already clear that comparing the cotopies of both concepts will not yield the desired results, i.e. a maximum similarity between both concepts. Thus we use a modified version SC' of the Semantic Cotopy in which we only consider the concepts common to both concept hierarchies in the Semantic Cotopy

(compare [14,15]), i.e.

$\begin{eqnarray*} & & SC'(c_i,O_1,O_2):= \{c_j \in C_1 \cap C_2 \vert c_j \leq_{C_1} c_i \vee c_i \leq_{C_1} c_j\} \end{eqnarray*}$

By using this Common Semantic Cotopy we thus exclude from the comparison concepts such as runable, offerable, needable, activity, vehicle etc. which are only in one ontology. So, the Common Semantic Cotopy

of the concepts vehicle and driveable is identical in both ontologies in Figure 5, i.e. {bike, car} thus representing a perfect overlap between both concepts, which certainly corresponds to our intuitions about the similarity of both concepts. However, let's now consider the concept hierarchy in Figure 6. The common cotopy of the concept bike is {bike} in both concept hierarchies. In fact, every leave concept in the left concept hierarchy has a maximum overlap with the corresponding concept in the right ontology. This is certainly undesirable and in fact leads to very high baselines when comparing such trivial concept hierarchies with a reference standard (compare our earlier results [14,15]). Thus, we introduce a further modification of the Semantic Cotopy by excluding the concept itself from its Common Semantic Cotopy, i.e:

$\begin{eqnarray*} & & SC''(c_i,O_1,O_2):= \{c_j \in C_1 \cap C_2 \vert c_j <_{C_1} c_i \vee c_i <_{C_1} c_j\} \end{eqnarray*}$

This maintains the perfect overlap between vehicle and driveable in the concept hierarchies in Figure 5, while yielding empty common cotopies for all the leave concepts in the left ontology of Figure 6.

**Figure 6:** Example for a trivial concept hierarchy $O_{trivial}$ (left) compared to the reference concept hierarchy $O_{ref}$ (right)
$\begin{figure} \begin{center} \epsfig{file=baseline.eps, width=6.5cm} \hspace{0.5cm} \epsfig{file=example.eps, width=5.2cm} \end{center} \end{figure}$

Now, according to Maedche et al. the Taxonomic Overlap ( $\overline{TO}$ ) of two ontologies

and

is computed as follows:

$\displaystyle \overline{TO}(O_1,O_2)=\frac{1}{\vert C_1\vert}\sum_{c \in C_1} TO(c,O_1,O_2)$

where

$\displaystyle TO(c,O_1,O_2) := \left\{ \begin{array}{cc} TO'(c,O_1,O_2) & \mbox... ...n C_2 \\ TO''(c,O_1,O_2) & \mbox{if} c \not \in C_2\\ \end{array}\right.$

and TO' and TO'' are defined as follows:

		$\displaystyle TO'(c,O_1,O_2):= \frac{\vert SC(c,O_1,O_2) \cap SC(c,O_2,O_1)\vert}{\vert SC(c,O_1,O_2) \cup SC(c,O_2,O_1)\vert}$
		$\displaystyle TO''(c,O_1,O_2):=max_{c' \in C_2} \,\, \frac{\vert SC(c,O_1,O_2) \cap SC(c',O_2,O_1)\vert}{\vert SC(c,O_1,O_2) \cup SC(c',O_2,O_1)\vert}$

So,

gives the similarity between concepts which are in both ontologies by comparing their respective semantic cotopies. In contrast,

gives the similarity between a concept $c \in C_{1}$ and that concept

in $C_{2}$ which maximizes the overlap of the respective semantic cotopies, i.e. it makes an optimistic estimation assuming an overlap that just does not happen to show up at the immediate lexical surface (compare [40]). The Taxonomic Overlap $\overline{TO}(O_1,O_2)$ between the two ontologies is then calculated by averaging over all the taxonomic overlaps of the concepts in $C_{1}$ . In our case it doesn't make sense to calculate the Semantic Cotopy for concepts which are in both ontologies as they represent leave nodes and thus their common semantic cotopies

are empty. Thus, we calculate the Taxonomic Overlap between two ontologies as follows:

$\displaystyle \overline{TO'}(O_1,O_2)=\frac{1}{\vert C_1 \backslash C_2\vert}\s... ...) \cap SC''(c',O_2,O_1)\vert}{\vert SC''(c,O_1,O_2) \cup SC''(c',O_2,O_1)\vert}$

Finally, as we do not only want to compute the Taxonomic Overlap in one direction, we introduce the precision, recall and an F-Measure calculating the harmonic mean of both:

		$\displaystyle P(O_1,O_2)=\overline{TO'}(O_1,O_2)$
		$\displaystyle R(O_1,O_2) =\overline{TO'}(O_2,O_1)$
		$\displaystyle F(O_1,O_2) = \frac{2 \cdot P(O_1,O_2) \cdot R(O_1,O_2)}{P(O_1,O_2)+R(O_1,O_2)}$

The importance of balancing recall and precision against each other will be clear in the discussion of a few examples below. Let's consider for example the concept hierarchy $O_{perfect}$ in Figure 5. For the five concepts bookable, joinable, rentable, driveable and rideable we find a corresponding concept in $O_{ref}$ with a maximum Taxonomic Overlap

and the other way round for the concepts activity, object-to-rent, vehicle and two-wheeled-vehicle in $O_{ref}$ , such that $P(O_{perfect},O_{ref})=R(O_{perfect},O_{ref})=F(O_{perfect},O_{ref})=100\%$ .

**Figure:** Example for a concept hierarchy with lower recall ( $O_{\downarrow_R}$ ) compared to the reference concept hierarchy $O_{ref}$
$\begin{figure} \begin{center} \epsfig{file=examplelowrec.eps, width=5.5cm} \hspace{1cm} \epsfig{file=example.eps, width=5.5cm} \end{center} \end{figure}$

**Figure:** Example for a concept hierarchy with lower precision ( $O_{\downarrow_P}$ ) compared to reference concept hierarchy $O_{ref}$
$\begin{figure} \begin{center} \epsfig{file=examplelowprec.eps, width=5.5cm} \hspace{1cm} \epsfig{file=example.eps, width=5.5cm} \end{center} \end{figure}$

In the concept hierarchy $O_{\downarrow_R}$ shown in Figure 7 the precision is still 100% for the same reasons as above, but due to the fact that the rideable concept has been removed there is no corresponding concept for two-wheeled-vehicle. The concept maximizing the taxonomic similarity in $O_{ref}$ for two-wheeled-vehicle is driveable with a Taxonomic Overlap of 0.5. The recall is thus $R(O_{\downarrow_R},O_{ref})=\overline{TO'}(O_{ref},O_{\downarrow_R})=\frac{1+1+1+\frac{1}{2}}{4}=87.5\%$ and the F-Measure decreases to $F(O_{\downarrow_R},O_{ref})=93.33\%$ .
In the concept hierarchy of $O_{\downarrow_P}$ in Figure 8, an additional concept planable has been introduced, which reduces the precision to $P(O_{\downarrow_P},O_{ref})=\frac{1+1+1+1+\frac{1}{2}}{5}=90\%$ , while the recall stays obviously the same at $R(O_{\downarrow_P},O_{ref})=100\%$ and thus the F-Measure is $F(O_{\downarrow_P},O_{ref})=94.74\%$ . It becomes thus clear why it is important to measure the precision and recall of the automatically learned concept hierarchies and balance them against each other by the harmonic mean or F-Measure. For the automatically learned concept hierarchy $O_{auto}$ in Figure 4 the precision is $P(O_{auto},O_{ref})=\frac{\frac{2}{6} + \frac{1}{6} + 1 + \frac{1}{2} + \frac{1}{2}}{5}=50\%$ , the recall $R(O_{auto},O_{ref})=\frac{\frac{1}{2} + \frac{3}{5} + \frac{2}{5} + 1}{4}=62.5\%$ and thus the F-Measure $F(O_{auto},O_{ref})=55.56\%$ .
As a comparison, for the trivial concept hierarchy $O_{trivial}$ in Figure 6 we get $P(O_{trivial},O_{ref})=100\%$ (per definition), $R(O_{trivial},O_{ref})=\frac{\frac{2}{6} + \frac{3}{6} + \frac{2}{6} + \frac{1}{6}}{4}=33.33\%$ and $F(O_{trivial},O_{ref})=50\%$ . It is important to mention that though in our toy examples the difference with respect to these measures between the automatically learned concept hierarchy $O_{auto}$ and the trivial concept hierarchy $O_{trivial}$ is not so big, when considering real-world concept hierarchies with a much higher number of concepts it is clear that the F-Measures for trivial concept hierarchies will be very low (see the results in Section 6). Finally, we also calculate the harmonic mean of the lexical recall and the F-Measure as follows:

$\begin{displaymath}F'(O_1,O_2)=\frac{2 \cdot LR(O_1,O_2) \cdot F(O_1,O_2)}{LR(O_1,O_2) + F(O_1,O_2)}\end{displaymath}$

For the automatically learned concept hierarchy $O_{auto}$ , we get for example:

$\begin{displaymath}F'(O_1,O_2)=\frac{2 \cdot 50\% \cdot 55.56\%}{50\% + 55.56\%}=52.63\%.\end{displaymath}$

Next: Results Up: Learning Concept Hierarchies from Previous: Text Processing

Philipp Cimiano 2005-08-04