Main Page Namespace List Class Hierarchy Alphabetical List Compound List File List Namespace Members Compound Members File Members Related Pages

IdentifinderParser.hpp File Reference

#include "Parser.hpp"
#include "TextHandler.hpp"
#include "LinkedPropertyList.hpp"

Go to the source code of this file.

Compounds

class IdentifinderParser

Defines

#define BEGIN_PREFIX   "B_"

#define END_PREFIX   "E_"

#define PREFIX_LEN   2

Define Documentation

#define BEGIN_PREFIX "B_"

Parses documents in with similar document separation tags NIST's Web format. <DOC></DOC> around documents and <DOCNO></DOCNO> around docids. This parser recognizes named entity tags from the Identifinder tagger and passed them along as properties. For each tag X, also adds in b_X and e_X to the first and last token of each entity. For example, "Carnegie Mellon University" was identified as a place, it would be parsed with the following properties: Carnegie [b_place] [place] Mellon [place] University [e_place] [place] A single token entity, like Madonna would be Madonna [b_person] [person] [e_person] Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.
U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.

#define END_PREFIX "E_"

#define PREFIX_LEN 2

Generated on Wed Nov 3 12:59:12 2004 for Lemur Toolkit by

1.2.18


Compounds
class	IdentifinderParser
Defines
#define	BEGIN_PREFIX "B_"
#define	END_PREFIX "E_"
#define	PREFIX_LEN 2