The set of features defined for the training of the system is described in Figure 9 and is based on the features described by Ng1996 and Escudero2000ecml. These features represent words, collocations, and POS tags in the local context. Both ``collapsed'' and ``non-collapsed'' functions are used.
Actually, each item in Figure 9 groups several sets
of features. The majority of them depend on the nearest words
(e.g., comprises all possible features defined by the words
occurring in each sample at positions
,
,
,
,
,
related to the ambiguous
word). Types nominated with capital letters are based on the
``collapsed'' function form; that is, these features simply
recognize an attribute belonging to the training data.
Keyword features (m) are inspired by Ng1996 work.
Noun filtering is done using frequency information for nouns
co-occurring with a particular sense. For example, let us suppose
for a set of 100 examples of interest#4: if the
noun bank is found 10 times or more at any position, then a
feature is defined.
Moreover, new features have also been defined using other
grammatical properties: relationship features () that refer to
the grammatical relationship of the ambiguous word
(subject, object, complement, ...) and
dependency features (
and
) that extract the word related to the
ambiguous one through the dependency parse tree.