Next: Text Processing Up: Learning Concept Hierarchies from Previous: Overall Process

Formal Concept Analysis

Formal Concept Analysis (FCA) is a method mainly used for the analysis of data, i.e. for deriving implicit relationships between objects described through a set of attributes on the one hand and these attributes on the other. The data are structured into units which are formal abstractions of concepts of human thought, allowing meaningful comprehensible interpretation [26]. Thus, FCA can be seen as a conceptual clustering technique as it also provides intensional descriptions for the abstract concepts or data units it produces. Central to FCA is the notion of a formal context:

Definition 1 (Formal Context)
A triple (

) is called a formal context if

and

are sets and $I \subseteq G \times M$ is a binary relation between

and

. The elements of

are called objects, those of

attributes and I is the incidence of the context.

For $A \subseteq G$ , we define: $A':=\{m \in M \vert \forall g \in A: (g,m) \in I\}$

and dually for $B \subseteq M$ : $B':=\{g \in G \vert \forall m \in B: (g,m) \in I \}$
Intuitively speaking,

is the set of all attributes common to the objects of

, while

is the set of all objects that have all attributes in

. Furthermore, we define what a formal concept is:

Definition 2 (Formal Concept)
A pair (

) is a formal concept of (

) if and only if $A \subseteq G, B \subseteq M, A'=B \mbox{and} A=B'$ .

In other words, (

) is a formal concept if the set of all attributes shared by the objects of

is identical with

and on the other hand

is also the set of all objects that have all attributes in

is then called the extent and

the intent of the formal concept (

). The formal concepts of a given context are naturally ordered by the subconcept-superconcept relation as defined by:

$\displaystyle (A_1,B_1) \leq (A_2,B_2) \Leftrightarrow A_1 \subseteq A_2 (\Leftrightarrow B_2 \subseteq B_1)$

Thus, formal concepts are partially ordered with regard to inclusion of their extents or (which is equivalent) inverse inclusion of their intent. We now give some examples to illustrate our definitions. In the context of the tourism domain one knows for example that things like a hotel, an apartment, a car, a bike, a trip or an excursion can be booked. Furthermore, we know that we can rent a car, a bike or an apartment. Moreover, we can drive a car or a bike, but only ride a bike³. In addition, we know that we can join an excursion or a trip. We can now represent the formal context corresponding to this knowledge as a formal context (see Table 1). The lattice produced by FCA is depicted in Figure 2 (left)⁴. It can be transformed into a special type of concept hierarchy as shown in Figure 2 (right) by removing the bottom element, introducing an ontological concept for each formal concept (named with the intent) and introducing a subconcept for each element in the extent of the formal concept in question. In order to formally define the transformation of the lattice $(\mathfrak{B},\leq)$ into the partial order $(C',\leq')$ , we assume that the lattice is represented using reduced labeling. Reduced labeling as defined in [26] means that objects are in the extension of the most specific concept and attributes conversely in the intension of the most general one. This reduced labeling is achieved by introducing functions $\gamma$ and $\mu$ . In particular, the name of an object

is attached to the lower half of the corresponding object concept, i.e. $\gamma(g):=(\{g\}'',\{g\}')$ , while the name of attribute

is located at the upper half of the attribute concept, i.e. $\mu(m):=(\{m\}',\{m\}'')$ . Now given a lattice $(\mathfrak{B},\leq)$ of formal concepts for a formal context

, we transform it into a partial order $(C',\leq')$ as follows:

Definition 3 (Transformation of $(\mathfrak{B},\leq)$ to $(C',\leq')$ )
First of all

contains objects as well as intents (sets of attributes):

$\begin{displaymath}C' := G \cup \{B \vert (A,B) \in \mathfrak{B}\}\end{displaymath}$

Further:

$\begin{displaymath}\leq' := \{(g,B_1) \vert \gamma(g)=(A_1,B_1)\} \cup \{(B_1,B_2) \vert (A_1,B_1) \leq (A_2,B_2)\}\end{displaymath}$

Finally, as FCA typically produces a high number of concepts, we compress the resulting hierarchy of ontological concepts by removing any inner node whose extension in terms of leave nodes subsumed is the same as the one of its child, i.e. we create a partial order $(C'',\leq_C'')$ as follows:

Definition 4 (Compacted Concept Hierarchy $(C'',\leq'')$ )
Assuming that

is the set of leave nodes dominated by

according to $\leq_C'$ :

$\begin{displaymath}C'' := \{c_2 \in C' \vert \forall c_1 \in C' c_2 \leq_C' c_1 \rightarrow extension(c_2) \ne extension(c_1)\}\end{displaymath}$

Further:

$\begin{displaymath}\leq_C'' := \leq_C' \vert _{C'' \times C''}\end{displaymath}$

i.e. $\leq_C''$ is the relation $\leq_C'$ restricted to pairs of elements of

In particular for the hierarchy in figure 2 (right) we would remove the rideable concept.

Table 1: Tourism domain knowledge as formal context

	bookable	rentable	driveable	rideable	joinable
hotel	x
apartment	x	x
car	x	x	x
bike	x	x	x	x
excursion	x				x
trip	x				x

**Figure 2:** The lattice of formal concepts (left) and the corresponding hierarchy of ontological concepts (right) for the tourism example
$\begin{figure} \begin{center} \epsfig{file=tourismlattice.eps, width=5cm} \epsfig{file=hierarchy.eps, width=6cm} \end{center} \end{figure}$

At a first glance, it seems that the hierarchy shown in Figure 2 (right) is somehow odd due to the fact that the labels of abstract concepts are verbs rather than nouns as typically assumed. However, from a formal point of view, concept identifiers have no meaning at all so that we could have just named the concepts with some other arbitrary symbols. The reason why it is handy to introduce 'meaningful' concept identifiers is for the purpose of easier human readability. In fact, if we adopt an extensional interpretation of our hierarchy, we have no problems asserting that the extension of the concept denoted by bike is a subset of the extension of the concept of the rideable objects in our world. This view is totally compatible with interpreting the concept hierarchy in terms of formal subsumption as given by the logical formula: $\forall x (bike(x) \rightarrow rideable(x))$ . We thus conclude that from an extensional point of view the 'verb-like' concept identifiers have the same status as any concept label based on a noun. From an intensional point of view, there may not even exist a hypernym with the adequate intension to label a certain abstract concept, such that using a verb-like identifier may even be the most appropriate choice. For example, we could easily replace the identifiers joinable, rideable and driveable by activity, two-wheeled vehicle and vehicle, respectively. However, it is certainly difficult to substitute rentable by some 'meaningful' term denoting the same extension, i.e. all the things that can be rented. It is also important to mention that the learned concept hierarchies represent a conceptualization of a domain with respect to a given corpus in the sense that they represent the relations between terms as they are used in the text. However, corpora represent a very limited view of the world or a certain domain due to the fact that if something is not mentioned, it does not mean that it is not relevant, but simply that it is not an issue for the text in question. This also leads to the fact that certain similarities between terms with respect to the corpus are actually accidental, in the sense that they do not map to a corresponding semantic relation, and which are due to the fact that texts represent an arbitrary snapshot of a domain. Thus, the learned concept hierarchies have to be merely regarded as approximations of the conceptualization of a certain domain. The task we are now focusing on is: given a certain number of terms referring to concepts relevant for the domain in question, can we derive a concept hierarchy between them? In terms of FCA, the objects are thus given and we need to find the corresponding attributes in order to build an incidence matrix, a lattice and then transform it into a corresponding concept hierarchy. In the following section, we describe how we acquire these attributes automatically from the underlying text collection.

Next: Text Processing Up: Learning Concept Hierarchies from Previous: Overall Process

Philipp Cimiano 2005-08-04