The underlying hypothesis of this knowledge base method is that the higher the similarity between two words, the larger the amount of information shared by two of its concepts. In this case, the information commonly shared by several concepts is indicated by the most specific concept that subsumes them in the taxonomy.
The input for this WSD module is a group of nouns
in a context. Each word
is sought in
WordNet, each having an associated set of possible senses
, and each sense having a set
of concepts in the IS-A taxonomy (hypernymy/hyponymy relations).
First, this method obtains the common concept to all the senses of
the words that form the context. This concept is marked by the
initial specification mark (ISM). If this initial specification
mark does not resolve the ambiguity of the word, we then descend
through the WordNet hierarchy, from one level to another,
assigning new specification marks. For each
specification mark, the number of concepts
contained within the subhierarchy is then counted.
The sense that corresponds to the
specification mark with the highest number of words is the one
chosen to be sense disambiguated within the given context. Figure
1 illustrates graphically how the word plant,
having four different senses, is disambiguated in a context that
also has the words tree, perennial, and leaf. It
can be seen that the initial specification mark does not resolve
the lexical ambiguity, since the word plant appears in
two subhierarchies with different senses. The specification mark
identified by {plant#2, flora#2}, however, contains
the highest number of words (three) from the context and will
therefore be the one chosen to resolve the sense two of the word
plant. The words tree and perennial are
also disambiguated, choosing for both the sense one. The word
leaf does not appear in the subhierarchy of the
specification mark {plant#2, flora#2}, and therefore this
word has not been disambiguated. These words are beyond the scope of the
disambiguation algorithm. They will be left
aside to be processed by a complementary set of heuristics (see
section 3.1.2).