Documentation: Training Stacked Denosing Autoencoders

cmds/run_SdA.py -- Training Stacked Denosing Autoencoders

-------------------------------------------------------------------------------------------------------------------

Arguments

argument	meaning	default value / comment
--train-data	training data specification	required
--nnet-spec	--nnet-spec="d:h(1):h(2):...:h(n):s" Eg.250:1024:1024:1024:1024:1920	required. d-input dimension; h(i)-size of the i-th hidden layers; s-number of targets
--output-file	path to save the resulting net	required
--wdir	working directory	required

--param-output-file	path to save model parameters in the PDNN format	by default "": doesn't output PDNN-formatted model
--cfg-output-file	path to save model config	by default "": doesn't output model config
--kaldi-output-file	path to save the Kaldi-formatted model	by default "": doesn't output Kaldi-formatted model

--corruption-level	corruption factor for binary random masking	by default 0.2
--learning-rate	learning rate value; constant	by default 0.01
--epoch-number	number of epochs	by default 10
--batch-size	mini-batch size during training	by default 128
--momentum	the momentum factor	by default 0.5
--ptr-layer-number	number of layers to be pre-trained	by default train all the hidden layers
--sparsity	these two parameters together achieve sparse autoencoders at each layer of SdA. The implementation follows this paper. sparsity and sparsity-weight correspond to rho on page14 and beta on page15 in the paper.	by default both parameters are set to None, we are imposing no sparsity
--sparsity-weight

--hidden-activation	hidden activation function	by default sigmoid
--1stlayer-reconstruct-activation	reconstruction activation function for the first layer. Now supports sigmoid and tanh reconstruction activation functions.	by default sigmoid. If your inputs are mean (and sometimes variance) normalized, you need to use tanh for feature reconstruction.

Example

python cmds/pdnn/run_SdA.py --train-data "train.pickle.gz,partition=600m,stream=true,random=true" \
                      --nnet-spec "330:1024:1024:1024:1024:1901" \
--wdir ./ --ptr-layer-number 4
                            --1stlayer-reconstruct-activation tanh \
                            --param-output-file sda.mdl

Your application may not have targets, for example, unsupervised training. In this case, you still need to specify the target number (1901 in this example). However, you can specify a fake target number. The layers which will be trained are decided by --ptr-layer-number. For instance, this example only trains the first 4 hidden layers and ignores the final softmax layer.