Main Page   Compound List   File List   Compound Members   File Members  

ng_t Struct Reference

Data structure for storing ngram. More...

#include <ngram.h>

List of all members.

Public Attributes

unsigned short n
int version
sih_tvocab_ht
unsigned short vocab_size
char ** vocab
unsigned short no_of_ccs
table_size_ttable_sizes
id__t ** word_id
count_ind_t ** count
count_ind_tmarg_counts
int ** count4
int * marg_counts4
bo_weight_t ** bo_weight
four_byte_t ** bo_weight4
index__t ** ind
double min_alpha
double max_alpha
unsigned short out_of_range_alphas
double * alpha_array
unsigned short size_of_alpha_array
count_ind_t count_table_size
count_t ** count_table
ptr_tab_t ** ptr_table
unsigned short * ptr_table_size
unsigned short discounting_method
cutoff_tcutoffs
int ** freq_of_freq
unsigned short * fof_size
unsigned short * disc_range
disc_val_t ** gt_disc_ratio
disc_val_tlin_disc_ratio
double * abs_disc_const
uni_probs_tuni_probs
uni_probs_tuni_log_probs
flagcontext_cue
int n_unigrams
int min_unicount
char * id_gram_filename
FILE * id_gram_fp
char * vocab_filename
char * context_cues_filename
FILE * context_cues_fp
flag write_arpa
char * arpa_filename
FILE * arpa_fp
flag write_bin
char * bin_filename
FILE * bin_fp
int * num_kgrams
unsigned short vocab_type
unsigned short first_id
double zeroton_fraction
double oov_fraction
flag four_byte_alphas
flag four_byte_counts


Detailed Description

Data structure for storing ngram.

Definition at line 90 of file ngram.h.


Member Data Documentation

double* ng_t::abs_disc_const
 

The constant required for absolute discounting

Definition at line 160 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), display_stats(), load_lm(), main(), and write_bin_lm().

double* ng_t::alpha_array
 

Definition at line 130 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), for(), load_lm(), main(), and write_bin_lm().

char* ng_t::arpa_filename
 

The filaname of the arpa format LM

Definition at line 187 of file ngram.h.

Referenced by main().

FILE* ng_t::arpa_fp
 

The file of the arpa format LM

Definition at line 188 of file ngram.h.

Referenced by for(), if(), and main().

char* ng_t::bin_filename
 

The filaname of the bin format LM

Definition at line 191 of file ngram.h.

Referenced by main(), and write_bin_lm().

FILE* ng_t::bin_fp
 

The file of the bin format LM

Definition at line 192 of file ngram.h.

Referenced by load_lm(), main(), and write_bin_lm().

bo_weight_t** ng_t::bo_weight
 

Pointer to array of back-off weights

Definition at line 117 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), for(), load_lm(), main(), and write_bin_lm().

four_byte_t** ng_t::bo_weight4
 

Pointer to array of 4 byte back_off weights. Only one of these arrays will be allocated

Definition at line 118 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), for(), load_lm(), main(), and write_bin_lm().

flag* ng_t::context_cue
 

True if word with this id is a context cue

Definition at line 167 of file ngram.h.

Referenced by compute_perplexity(), compute_unigram(), display_stats(), load_lm(), main(), validate(), and write_bin_lm().

char* ng_t::context_cues_filename
 

The filename of the context cues file

Definition at line 178 of file ngram.h.

Referenced by main().

FILE* ng_t::context_cues_fp
 

The file pointer of the context cues file

Definition at line 180 of file ngram.h.

Referenced by main().

count_ind_t** ng_t::count
 

Pointer to array of count lists (actually indices in a count table)

Definition at line 108 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), increment_context(), load_lm(), main(), and write_bin_lm().

int** ng_t::count4
 

Alternative method of storing the counts, using 4 bytes. Not normally allocated

Definition at line 114 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), increment_context(), load_lm(), main(), and write_bin_lm().

count_t** ng_t::count_table
 

Pointer to array of count tables

Definition at line 136 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), increment_context(), load_lm(), main(), and write_bin_lm().

count_ind_t ng_t::count_table_size
 

Have same size for each count table

Definition at line 135 of file ngram.h.

Referenced by increment_context(), load_lm(), main(), and write_bin_lm().

cutoff_t* ng_t::cutoffs
 

Array of cutoffs

Definition at line 149 of file ngram.h.

Referenced by calc_mem_req(), load_lm(), main(), and write_bin_lm().

unsigned short* ng_t::disc_range
 

Pointer to array of discounting ranges - typically will be fof_size - 1, but can be reduced further if stats are anomolous

Definition at line 153 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), display_stats(), increment_context(), load_lm(), main(), and write_bin_lm().

unsigned short ng_t::discounting_method
 

See define stuff at the top of this file

Definition at line 147 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), display_stats(), load_lm(), main(), and write_bin_lm().

unsigned short ng_t::first_id
 

0 if we have open vocab, 1 if we have a closed vocab.

Definition at line 202 of file ngram.h.

Referenced by compute_back_off(), compute_unigram(), display_stats(), increment_context(), load_lm(), main(), validate(), and write_bin_lm().

unsigned short* ng_t::fof_size
 

The sizes of the above arrays

Definition at line 152 of file ngram.h.

Referenced by display_stats(), load_lm(), main(), and write_bin_lm().

flag ng_t::four_byte_alphas
 

Definition at line 213 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), display_stats(), for(), load_lm(), main(), and write_bin_lm().

flag ng_t::four_byte_counts
 

Definition at line 214 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), increment_context(), load_lm(), main(), and write_bin_lm().

int** ng_t::freq_of_freq
 

Array of frequency of frequency information

Definition at line 150 of file ngram.h.

Referenced by compute_unigram(), display_stats(), load_lm(), main(), and write_bin_lm().

disc_val_t** ng_t::gt_disc_ratio
 

The discounted values of the counts

Definition at line 157 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), display_stats(), load_lm(), main(), and write_bin_lm().

char* ng_t::id_gram_filename
 

The filename of the id-gram file

Definition at line 175 of file ngram.h.

Referenced by calc_mem_req(), and main().

FILE* ng_t::id_gram_fp
 

The file pointer of the id-gram file

Definition at line 176 of file ngram.h.

Referenced by calc_mem_req(), and main().

index__t** ng_t::ind
 

Pointer to array of index lists

Definition at line 121 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), increment_context(), load_lm(), main(), num_of_types(), and write_bin_lm().

disc_val_t* ng_t::lin_disc_ratio
 

The linear discounting ratio

Definition at line 159 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_unigram(), display_stats(), load_lm(), main(), and write_bin_lm().

count_ind_t* ng_t::marg_counts
 

Array of marginal counts for the unigrams. The normal unigram counts differ in that context cues have zero counts there, but not here

Definition at line 110 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), increment_context(), load_lm(), main(), and write_bin_lm().

int* ng_t::marg_counts4
 

Ditto

Definition at line 116 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), increment_context(), load_lm(), main(), and write_bin_lm().

double ng_t::max_alpha
 

The maximum alpha in the table

Definition at line 127 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), for(), load_lm(), main(), and write_bin_lm().

double ng_t::min_alpha
 

The minimum alpha in the table

Definition at line 126 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), for(), load_lm(), main(), and write_bin_lm().

int ng_t::min_unicount
 

Count to which infrequent unigrams will be bumped up

Definition at line 171 of file ngram.h.

Referenced by main().

unsigned short ng_t::n
 

n=3 for trigram, n=4 for 4-gram etc.

Definition at line 94 of file ngram.h.

Referenced by calc_mem_req(), compute_perplexity(), display_stats(), for(), load_lm(), main(), validate(), and write_bin_lm().

int ng_t::n_unigrams
 

Total number of unigrams in the training data

Definition at line 169 of file ngram.h.

Referenced by compute_unigram(), load_lm(), main(), and write_bin_lm().

unsigned short ng_t::no_of_ccs
 

Number of context cues

Definition at line 102 of file ngram.h.

Referenced by compute_unigram(), display_stats(), load_lm(), main(), and write_bin_lm().

int* ng_t::num_kgrams
 

Array indicating how many 2-grams, ... ,n-grams, have been processed so far

Definition at line 196 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), display_stats(), increment_context(), load_lm(), main(), num_of_types(), and write_bin_lm().

double ng_t::oov_fraction
 

Definition at line 212 of file ngram.h.

Referenced by compute_unigram(), display_stats(), load_lm(), main(), and write_bin_lm().

unsigned short ng_t::out_of_range_alphas
 

The maximum number of out of range alphas that we are going to allow.

Definition at line 128 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), for(), load_lm(), main(), and write_bin_lm().

ptr_tab_t** ng_t::ptr_table
 

Pointer to the tables used for compact representation of the indices

Definition at line 140 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), increment_context(), load_lm(), main(), num_of_types(), and write_bin_lm().

unsigned short* ng_t::ptr_table_size
 

Pointer to array of pointer tables

Definition at line 142 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), increment_context(), load_lm(), main(), num_of_types(), and write_bin_lm().

unsigned short ng_t::size_of_alpha_array
 

Definition at line 131 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), for(), load_lm(), main(), and write_bin_lm().

table_size_t* ng_t::table_sizes
 

Pointer to table size array

Definition at line 106 of file ngram.h.

Referenced by calc_mem_req(), and main().

uni_probs_t* ng_t::uni_log_probs
 

Log probs for each unigram

Definition at line 166 of file ngram.h.

Referenced by compute_unigram(), for(), load_lm(), main(), and write_bin_lm().

uni_probs_t* ng_t::uni_probs
 

Probs for each unigram

Definition at line 165 of file ngram.h.

Referenced by bo_ng_prob(), compute_unigram(), for(), load_lm(), main(), and write_bin_lm().

int ng_t::version
 

Definition at line 95 of file ngram.h.

Referenced by load_lm(), and write_bin_lm().

char** ng_t::vocab
 

Array of vocabulary words

Definition at line 101 of file ngram.h.

Referenced by compute_back_off(), compute_perplexity(), display_stats(), for(), load_lm(), main(), and validate().

char* ng_t::vocab_filename
 

The filename of the vocabulary file

Definition at line 177 of file ngram.h.

Referenced by main().

sih_t* ng_t::vocab_ht
 

Vocabulary hash table

Definition at line 99 of file ngram.h.

Referenced by compute_perplexity(), load_lm(), main(), validate(), and write_bin_lm().

unsigned short ng_t::vocab_size
 

Vocabulary size

Definition at line 100 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), compute_perplexity(), compute_unigram(), display_stats(), increment_context(), load_lm(), main(), num_of_types(), validate(), and write_bin_lm().

unsigned short ng_t::vocab_type
 

see define stuff at the top

Definition at line 200 of file ngram.h.

Referenced by calc_prob_of(), compute_perplexity(), compute_unigram(), display_stats(), load_lm(), main(), validate(), and write_bin_lm().

id__t** ng_t::word_id
 

Pointer to array of id lists

Definition at line 107 of file ngram.h.

Referenced by bo_ng_prob(), compute_back_off(), load_lm(), main(), and write_bin_lm().

flag ng_t::write_arpa
 

True if the language model is to be written out in arpa format

Definition at line 185 of file ngram.h.

Referenced by main().

flag ng_t::write_bin
 

True if the language model is to be written out in binary format

Definition at line 189 of file ngram.h.

Referenced by main().

double ng_t::zeroton_fraction
 

cap on prob(zeroton) as fraction of P(singleton)

Definition at line 210 of file ngram.h.

Referenced by compute_unigram(), load_lm(), main(), and write_bin_lm().


The documentation for this struct was generated from the following file:
Generated on Tue Dec 21 13:54:48 2004 by doxygen1.2.18