Main Page   Namespace List   Class Hierarchy   Alphabetical List   Compound List   File List   Namespace Members   Compound Members   File Members   Related Pages  

FreqCounter Class Reference

#include <FreqCounter.hpp>

Inheritance diagram for FreqCounter:

TextHandler List of all members.

Public Methods

 FreqCounter (const Stopper *stopWords=NULL)
 FreqCounter (const string &filename, const Stopper *stopWords=NULL)
 ~FreqCounter ()
 Delete the freqency counter.

void clear ()
 Clear the frequency counter (set all counts to 0).

void output (const string &filename) const
 Output the frequency information to a file.

char * randomWord ()
void setRandomMode (int mode)
int getRandomMode () const
char * randomCtf () const
char * randomDf () const
char * randomAveTf () const
char * randomUniform () const
int numWords () const
int totWords () const
const freqmapgetFreqInfo () const
int getCtf (const char *word) const
int getDf (const char *word) const
double getAveTf (const char *word) const
double ctfRatio (FreqCounter &lm1) const
char * handleDoc (char *docno)
 Overridden from TextHandler.

char * handleWord (char *word)
 Overridden from TextHandler - increments collection term frequencies.

void endDoc ()
 Specifies end of a document - updates document frequencies.

void setName (const string &freqCounterName)
 Set the name of language model described by the frequency counter.

const string & getName () const
 Get the counter's name.

void pruneBottomWords (int topWords)
 Prune least frequent words, keeping only topWords most frequent words.


Protected Methods

void input (const string &filename)

Protected Attributes

freqmap freqInfo
stringset doc
stringset randdone
string name
const Stopperstopper
long ctfTot
int dfTot
long double avetfTot
bool atfValid
int randomMode
int nWords

Detailed Description

Counts collection term frequencies and document frequencies. Also provides a means for selecting random words. The FreqCounter can use a stopword list.


Constructor & Destructor Documentation

FreqCounter::FreqCounter const Stopper   stopWords = NULL
 

Create a frequency counter with the specified stopword list. The stopWords parameter is optional.

FreqCounter::FreqCounter const string &    filename,
const Stopper   stopWords = NULL
 

Create a frequency counter (loading it from file) with the specified stopword list. Thes stopWords parameter is optional.

FreqCounter::~FreqCounter  
 

Delete the freqency counter.


Member Function Documentation

void FreqCounter::clear  
 

Clear the frequency counter (set all counts to 0).

double FreqCounter::ctfRatio FreqCounter &    lm1 const
 

Compare lm1 to this language model, returning the ctf ratio.

void FreqCounter::endDoc  
 

Specifies end of a document - updates document frequencies.

double FreqCounter::getAveTf const char *    word const
 

Get the average term frequency for a word.

int FreqCounter::getCtf const char *    word const
 

Get the collection term frequency for a word.

int FreqCounter::getDf const char *    word const
 

Get the document frequency for a word.

const freqmap * FreqCounter::getFreqInfo  
 

Get a reference to the internal frequency count map.

const string & FreqCounter::getName  
 

Get the counter's name.

int FreqCounter::getRandomMode  
 

Gets the current random word mode. See setRandomMode(...)

char * FreqCounter::handleDoc char *    docno [virtual]
 

Overridden from TextHandler.

Reimplemented from TextHandler.

char * FreqCounter::handleWord char *    word [virtual]
 

Overridden from TextHandler - increments collection term frequencies.

Reimplemented from TextHandler.

void FreqCounter::input const string &    filename [protected]
 

int FreqCounter::numWords  
 

Return the number of unique words seen across all documents processed.

void FreqCounter::output const string &    filename const
 

Output the frequency information to a file.

void FreqCounter::pruneBottomWords int    topWords
 

Prune least frequent words, keeping only topWords most frequent words.

char * FreqCounter::randomAveTf  
 

Select a word at random using average term frequency. This word is no guarenteed to be unique from other calls to this function.

char * FreqCounter::randomCtf  
 

Select a word at random using collection term frequency. This word is not guarenteed to be unique from other calls to this function.

char * FreqCounter::randomDf  
 

Select a word at random using document frequency. This word is not guarenteed to be unique from other calls to this function.

char * FreqCounter::randomUniform  
 

Select a word at random with equal probability for each word. This word is not guarenteed to be unique from other calls to this funtion.

char * FreqCounter::randomWord  
 

Get a random word from the distribution specified by setRandomMode. The random word is unique since the last clear operation.

void FreqCounter::setName const string &    freqCounterName
 

Set the name of language model described by the frequency counter.

void FreqCounter::setRandomMode int    mode
 

Set the random word selection mode: R_CTF - select using collection term frequency R_DF - select using document frequency R_AVE_TF - select using average term frequency R_UNIFORM - select each word with equal probability

int FreqCounter::totWords  
 

Return the total words seen across all documents processed.


Member Data Documentation

bool FreqCounter::atfValid [protected]
 

long double FreqCounter::avetfTot [protected]
 

long FreqCounter::ctfTot [protected]
 

int FreqCounter::dfTot [protected]
 

stringset FreqCounter::doc [protected]
 

freqmap FreqCounter::freqInfo [protected]
 

string FreqCounter::name [protected]
 

int FreqCounter::nWords [protected]
 

stringset FreqCounter::randdone [protected]
 

int FreqCounter::randomMode [protected]
 

const Stopper* FreqCounter::stopper [protected]
 


The documentation for this class was generated from the following files:
Generated on Wed Nov 3 12:59:34 2004 for Lemur Toolkit by doxygen1.2.18