In the main of sphinx3-anytopo (sr/libfbs/main.c) <- cmd_ln_define <- cmd_ln_print_definitions <- cmdline_parse <- load_argfile <- logs3_init (Can be found in src/libfbs/logs3.c What it does : Precomputation of log-add table. The comment explains everything) <- feat_init (Can be found in src/libfeat/feat.c, What it does : Initialize the feature stream type, pretty trivial) <- models_init (Can be found in src/libfbs/main.c, What it does : Initialize all daa structure for decoding.) ^---mdef_init (Can be found in src/libfbs/mdef.c, What it does : Initialize the model architecture file.) ^---parse_baselin_line ^---parse_tri_line ^---dict_init (Can be found in src/libfbs/dict.c, What it does :Read in the main and filler dictionaries) ^---dict_read ^---dict_wordid ^---gauden_init(Can be found in src/libfbs/gauden.c, What it does : Read in the mean and var files, precompute the variance determinants and apply the varfllor) ^---gauden_param_read ^---gauden_dist_precompute ^---logs3_to_log ^---feat_featsize ^---senone_init(Can be found in src/libfbs/senone.c What it does : Read in the mixture weight and create a mapping between senones and code book.) ^---sen2mgau_map_file ^---senone_mixw_read ^---interp_init (Can be found in src/libfbs/interp.c, what it does : read in the interpretation weighting scheme) ^---lm_read(Can be found in src/libfbs/lm.c, what it does : read in the language model ^---lm_read_dump ^---lm_fread_int32 ^---lm_add ^---fillpen_init (Can be found in src/libfbs/fillpen.c, what it does: initialize the filler penalty.) <- fwd_init ^---mdef_getmdef ^---tmat_gettmat ^---dict_getdict ^---lm_current ^---build_wwpid ^---build_xwdpid_map ^---build_rcpid ^---build_lcpid ^---build_lrcpid ^---chk_tp_upptertri <- process_ctlfiles (Can be found in src/libfbs/main.c, what it does : do MLLR if required, actually recognition starts) ^---gauden_mean_reload ^---mllr_read_regmat ^---mllr_norm_mgau ^---s2mfc_read ^---log_hypstr ^---log_hypseg ^---decode_utt (Can be found in src/libfbs/main.c, what it does: carry out forward search and best path search for each utterance) ^---fwdvit (Can be found in src/libfbs/fwdvit.c, what it does: carry out the forward search, the loop of recognition) ^---fwd_start_utt (Can be found in src/libfbs/newfwd.c, what it does: initialize the search) ^---word_cand_load ^---lmcontext ^---word_enter (Can be found in src/libfbs/newfwd.c, what it does, do the first step for each word in the viterbi search ) ^---fwd_sen_active ^---interp_cd_ci ^---gauden_dist (Can be found in src/libfbs/gauden.c, what it does, compute the likelihood of input vector given the gaussian distribution) ^---compute_dist ^---compute_dist_all (A special loop-unfolded version of computing all gaussians in a senones) ^---senone_eval ^---senone_eval_all (Can be found in src/libfbs/senone.c, what it does, Compute all senone scores for the case of fully continous distribution) ^---fwd_frame (Can be found in src/libfbs/newfwd.c, what it does, forward one frame in decoding) ^---whmm_eval ^---eval_mpx_whmm ^---eval_non_mpx_whmm ^---dump_all_whmm ^---dump_all_word ^---whmm_exit ^---lattice_entry ^---whmm_transition ^---two_word_history ^---lm_tg_score ^--load_tg ^---word_trans ^---whmm_renorm ^---counter_increment ^---fwd_end_utt ^---log_hypstr ^---lm_cache_stats_dump ^---write_bestscore ^---dag_dump ^---dag_search ^---dag_add_fudge_edges ^---dag_remove_filler_nodes ^---dag_best_path ^---dag_chk_linkscr ^---dag_backtrace ^---log_hyp_detailed Detail Description of routing logs3_init : Quoted from the comment in the source code. /* * In evaluating HMM models, probability values are often kept in log domain, * to avoid overflow. Furthermore, to enable these logprob values to be held * in int32 variables without significant loss of precision, a logbase of * (1+epsilon), epsilon<<1, is used. This module maintains this logbase (B). * * More important, maintaining probabilities in log domain creates a problem when * adding two probability values: difficult in the log domain. * Suppose P = Q+R (0 <= P,Q,R,Q+R <= 1), and we have to compute: * logB(P), given logB(Q) and logB(R). Assume Q >= R. * Let z = logB(P), x = logB(Q), y = logB(R). * Therefore, B^z = B^x + B^y = B^x(1 + B^(y-x)). * Therefore, z = x + logB(1+B^(y-x)). * Since the latter term only depends on y-x, and log probs are kept in integer * variables, it can be precomputed into a table for y-x = 0, -1, -2, -3... until * logB(1+B^(y-x)) = (int32) 0. */ mdef_init : notice that this function also initialize the word position for the triphones. CI phones and Triphones are all put in the triphones data structure but have different attributes. See the difference between parse_base_line and parse_tri_line gauden_init : notice that gauden_dist_precompute will actually replace the variance to preceompute variance. senone_init : notice that you can actually provide a senone-to-codebook. mapping file. Senone computation in s3: This is done in the following manner: 1, feat_cep2feat, compute the feature from its neighborhood. We have already got the whole waveform in its feature vector forms. This time we compute the "true" feature by computing the delta cepsta and delta deltat cepstra. 2, gauden_dist, compute all gaussian distributions (Scores will be in float) 3, senone_eval, given the gaussian scores, compute the senone(GMM) scores. Scores will be in integer 4, (Optional) interp_all, do interpolation between the ci and cd models. 5, The next step will be carry the forward search for this frame. (fwd_frame) The whole process of senone computation was done in a block of frames. Word from Ravi, this will improve the cache performance.