Documentation: Multi-Task Learning

cmds/run_MTL.sh -- Multi-Task Learning

---------------------------------------------------------------------------------------------------------------------

Arguments

argument	meaning/value	default value / comments
--train-data	training data specification	required. Data paths for different tasks are separated by "\|".
--valid-data	valid data specification	required. Data paths for different tasks are separated by "\|".
--task-number	how many tasks you are running (in order for verification)	required. Its value equals the number of tasks indicated by --train-data and --valid-data
--shared-nnet-spec	--shared-nnet-spec="d:h(1):h(2):...:h(m)" Eg. 250:1024:1024:1024	required. Specifies the structure of the lower layers shared across tasks. d-input dimension h(i)-size of the i-th hidden layer
--indiv-nnet-spec	--indiv-nnet-spec="h(1)(n):s(1)\| ...\|h(T)(n):s(T)" Eg. 1024:1920\|1024:1887\|1024:1790	required. Specifies task-specific upper layers separated by "\|". Although we only show one hidden layer h(t)(n), each task can have arbitrary upper-lower architecture. h(t)(n)-size of the n-th hidden layer for task t s(t)- number of targets for task t
--wdir	working directory	required

--param-output-file	(prefix) path to save model parameters in the PDNN format	by default "": doesn't output PDNN-formatted model. Filename for each task is appended with the suffix ".task#"
--cfg-output-file	(prefix) path to save model config	by default "": doesn't output model config. Filename for each task is appended with the suffix ".task#"
--kaldi-output-file	(prefix) path to save the Kaldi-formatted model	by default "": doesn't output Kaldi-formatted model. Filename for each task is appended with the suffix ".task#"
--model-save-step	number of epochs between model saving	by default 1: save the tmp model after each epoch

--ptr-file	pre-trained model file	by default "": no pre-training
--ptr-layer-number	how many layers to be initialized with the pre-trained model	required if --pre-file is provided

--lrate	learning rate	by default D:0.08:0.5:0.05,0.05:15
--batch-size	mini-batch size for SGD	256
--momentum	the momentum	0.5

--activation	the same as dnn	by default sigmoid
--activation

--input-dropout-factor	the same as dnn	by default 0: no dropout is applied to the input features
--dropout-factor	the same as dnn	by default "": no dropout is applied.

--l1-reg	l1 norm regularization weight train_objective = cross_entropy + l1_reg * [l1 norm of all weight matrices]	by default 0
--l2-reg	l2 norm regularization weight train_objective = cross_entropy + l2_reg * [l2 norm of all weight matrices]	by default 0
--max-col-norm	the max value of norm of gradients; usually used in dropout and maxout	by default none: not applied

Example

python pdnn/cmds/run_MTL.py --train-data "train.pickle.T1.gz|train.pickle.T2.gz,partition=600m,random=true" \
                         --valid-data "valid.pickle.T1.gz|valid.pickle.T2.gz,partition=600m,random=true" \
                             --task-number 2 --wdir ./ \
                             --shared-nnet-spec "330:1024:1024:1024" --shared-nnet-spec "1024:1920|1024:1887" \
                             --activation sigmoid / \
                             --param-output-file nnet.mdl --cfg-output-file nnet.cfg

By this example, we are doing multi-task learning on two tasks. They have the DNNs of 330:1024:1024:1024:1024:1920 and
330:1024:1024:1024:1024:1887 respectively. The lower 3 hidden layers (330:1024:1024:1024) are shared by the two tasks. After training is finished, you will find the model files: nnet.mdl.task1 & nnet.cfg.task1 for Task1, nnet.mdl.task2 & nnet.cfg.task2 for Task2