argument
|
meaning/value
|
comments
|
--train-data |
training data specification |
required |
--valid-data |
valid data specification |
required |
--nnet-spec
|
--nnet-spec="d:h(1):h(2):...:h(n):s"
Eg.250:1024:1024:1024:1024:1920 |
required. d-input
dimension; h(i)-size of the
i-th hidden layers; s-number
of targets |
--wdir
|
working
directory |
required
|
|
--param-output-file
|
path to
save model parameters in the PDNN format
|
by default
"": doesn't output PDNN-formatted model
|
--cfg-output-file
|
path to
save model config
|
by default
"": doesn't output model config
|
--kaldi-output-file
|
path to
save the Kaldi-formatted model
|
by default
"": doesn't output Kaldi-formatted model
|
--model-save-step
|
number of
epochs between model saving
|
by default
1: save the tmp model after each epoch
|
|
--ptr-file
|
pre-trained
model
file
|
by default
"": no pre-training
|
--ptr-layer-number |
how many
layers to be initialized with the pre-trained model
|
required if
--pre-file is provided |
|
--lrate |
learning
rate |
by default
D:0.08:0.5:0.05,0.05:15 |
--batch-size |
mini-batch
size for SGD |
by default 256
|
--momentum |
the momentum |
by default 0.5 |
|
--activation |
1.
sigmoid 2. tanh
3. rectifier
|
by default
sigmoid
|
4.
maxout:${group_size}
|
when
using maxout, you need to specify the group
size variable, i.e.,
the number of units in each max-pooling group. More details can be
found at the bottom of this page and also in this paper
|
|
--input-dropout-factor
|
dropout
factor for the input layer (features)
|
by
default 0: no dropout is applied to the input features
|
--dropout-factor
|
comma-delimited
dropout factors for *hidden layers*.
Note the matching between dropout factors and network structure
(nnet-spec)
E.g. --dropout-factor 0.2,0.2,0.2,0.2
|
by
default "": no dropout is applied. This is equivalent to setting
dropout factors to all 0s. However, the
latter
case will be slower. Thus, "--dropout-factor 0,0,0,0" is NOT
recommended.
|
|
--l1-reg |
l1
norm regularization weight
train_objective = cross_entropy + l1_reg * [l1 norm of all weight
matrices]
|
by default 0
|
--l2-reg |
l2
norm regularization weight
train_objective = cross_entropy + l2_reg * [l2 norm of all weight
matrices]
|
by default 0
|
--max-col-norm |
the
max value of norm of gradients; usually used in dropout and maxout
|
by default
none: not applied
|