Documentation: Training Convolutional Neural Networks

cmds/run_CNN.sh -- Training Convolutional Neural Networks

---------------------------------------------------------------------------------------------------------------------

Arguments

argument	meaning/value	comments
--train-data	training data specification	required
--valid-data	valid data specification	required
--conv-nnet-spec	net specification for the convolutional layers --conv-nnet-spec="txnxm:a,bxc,pdxe,f" Eg. "1x29x29:64,4x4,p2x2:128,5x5,p3x3,f" stacks two convolutional layers	required "txnxm": the inputs are t feature maps, each with the dimension of n x m "a,bxc,pdxe,f" describes one convolution layer a -- number of feature maps bxc -- size of local filters (kernels) dxe -- pooling size if "f" appears, the outputs are flattened you can continue to stack more convolution layers
--nnet-spec	net specification for the FC layers --nnet-spec="h(1):h(2):...:h(n):s" Eg. 1024:1024:1024:1920	required. h(i)-size of the i-th FC hidden layers; s-number of targets
--wdir	working directory	required

--param-output-file	path to save model parameters in the PDNN format	by default "": doesn't output PDNN-formatted model
--cfg-output-file	path to save model config	by default "": doesn't output model config
--kaldi-output-file	path to save the Kaldi-formatted model	by default "": doesn't output Kaldi-formatted model
--model-save-step	number of epochs between model saving	by default 1: save the tmp model after each epoch

--ptr-file	pre-trained model file	by default "": no pre-training
--ptr-layer-number	how many layers to be initialized with the pre-trained model	required if --pre-file is provided

--lrate	learning rate	by default D:0.08:0.5:0.05,0.05:15
--batch-size	mini-batch size for SGD	by default 256
--momentum	the momentum	by default 0.5
--use-fast	whether to use the fast version of CNN	by default false. More details at the bottom of this page

--conv-activation	activation function for the convolutional layers; more details on the DNN webpage	by default sigmoid
--activation	activation function for the FC layers; more details on the DNN webpage	by default sigmoid
--activation

--input-dropout-factor	dropout factor for the input layer (features)	by default 0: no dropout is applied to the input features
--dropout-factor	comma-delimited dropout factors for hidden layers. Note the matching between dropout factors and network structure (nnet-spec) E.g. --dropout-factor 0.2,0.2,0.2,0.2	by default "": no dropout is applied. This is equivalent to setting dropout factors to all 0s. However, the latter case will be slower. Thus, "--dropout-factor 0,0,0,0" is NOT recommended.

--l1-reg	l1 norm regularization weight train_objective = cross_entropy + l1_reg * [l1 norm of all weight matrices]	by default 0
--l2-reg	l2 norm regularization weight train_objective = cross_entropy + l2_reg * [l2 norm of all weight matrices]	by default 0
--max-col-norm	the max value of norm of gradients; usually used in dropout and maxout	by default none: not applied

Example

python pdnn/cmds/run_CNN.py --train-data "train.pickle.gz,partition=600m,stream=true,random=true" \
                      --valid-data "valid.pickle.gz,partition=600m,stream=true,random=true" \
                            --conv-nnet-spec "1x29x29:64,4x4,p2x2:128,5x5,p3x3,f" \
                            --nnet-spec "1024:1024:1024:1901" \
                            --wdir ./ --activation sigmoid --conv-activation sigmoid \
                            --param-output-file nnet.mdl --cfg-output-file nnet.cfg

Fast Version of CNN Training

If you want to speed up CNN training, you can switch to the fast implementation. This is based on the pylearn2 wrappers for the cuda-convnet library. Depending on your CNN architecutre, you can get 2x to 3x speed up.

1. download pylearn2: git clone git://github.com/lisa-lab/pylearn2.git

2. add the pylearn2 directory to your PYTHONPATH: export PYTHONPATH=$PYTHONPATH:/path/to/pylearn2

3. call run_CNN.py with "--use-fast true"

However, adopting this fast version imposes restrictions on your CNN architecture. Check this webpage for the restrictions.