Documentation: Training SAT Models for DNNs

cmds2/run_DNN_SAT.sh -- Training SAT Models for DNNs

---------------------------------------------------------------------------------------------------------------------

Refer to this webpage for more information about SAT for DNNs.

Arguments

argument	meaning/value	comments
--train-data	training data specification	required
--valid-data	valid data specification	required
--si-nnet-spec	--si-nnet-spec="dF:h(1):h(2):...:h(n):s" Eg.250:1024:1024:1024:1024:1920	required. Specifies structure of the SI model. dF-feature dimension; h(i)-size of the i-th hidden layers; s-number of targets
--adapt-nnet-spec	--si-nnet-spec="dI:ha(1):ha(2):...:ha(m)" Eg.100:512:512	required. Specifies structure of the Adaptation model. dI-i-vector dimension; ha(i)-size of the i-th adaptation layers
--init-model	path to the initial DNN model	required. A well-trained DNN model which serves as the initialization of the SI model
--wdir	working directory	required

--param-output-file	(prefix) path to save model parameters in the PDNN format	by default "": doesn't output PDNN-formatted model. Filenames for the SI and Adaptation models are appended with the suffix ".si" and ".adapt" respectively
--cfg-output-file	(prefix) path to save model config	by default "": doesn't output model config. Filenames for the SI and Adaptation models are appended with the suffix ".si" and ".adapt" respectively
--kaldi-output-file	(prefix) path to save the Kaldi-formatted model	by default "": doesn't output Kaldi-formatted model. Filenames for the SI and Adaptation models are appended with the suffix ".si" and ".adapt" respectively
--model-save-step	number of epochs between model saving	by default 1: save the tmp model after each epoch

--ptr-file	pre-trained model file	by default "": no pre-training
--ptr-layer-number	how many layers to be initialized with the pre-trained model	required if --pre-file is provided

--lrate	learning rate	by default D:0.08:0.5:0.05,0.05:15
--batch-size	mini-batch size for SGD	by default 256
--momentum	the momentum	by default 0.5

--activation	the same as dnn	by default sigmoid
--activation

--input-dropout-factor	the same as dnn	by default 0: no dropout is applied to the input features
--dropout-factor	the same as dnn	by default "": no dropout is applied.

--l1-reg	l1 norm regularization weight train_objective = cross_entropy + l1_reg * [l1 norm of all weight matrices]	by default 0
--l2-reg	l2 norm regularization weight train_objective = cross_entropy + l2_reg * [l2 norm of all weight matrices]	by default 0
--max-col-norm	the max value of norm of gradients; usually used in dropout and maxout	by default none: not applied

Example

python pdnn/cmds2/run_DNN_SAT.py --train-data "train.pfile.gz,partition=2000m,stream=true,random=true" \
                              --valid-data "valid.pfile.gz,partition=600m,stream=true,random=true" \
                                  --si-nnet-spec "330:1024:1024:1024:1024:1901" \
                                  --adapt-nnet-spec "100:512:512" \
                                  --init-model mdl.init --wdir ./ \
                                  --param-output-file nnet.mdl --cfg-output-file nnet.cfg

By this example, the SI model has the architecture of 330:1024:1024:1024:1024:1901. The Adaptation network has the architecture of 100:512:512:330. That is, an additional layer, which has the size equal to the SI model input dimension, is automatically added to the Adaptation network. This additional layer adopts the linear activation function. After training is finished, you will find the model files: nnet.mdl.si & nnet.cfg.si for the SI model, nnet.mdl.adapt & nnet.cfg.adapt for the Adaptation model.