CMU 11-731(MT&Seq2Seq) RNN language models

Long Distance Dependencies in Language

  • example
    • grammatical constraint
    • selectional preferences
    • topic / register

RNN

$$ h_t\;=\begin{array}{lc}\tan h\left(W_{xh}x_t\;+\;W_{hh}h_{t-1}\;+\;b_h\right)&if\;t\;\geqslant\:1\end{array}else\;0 $$

  • language model using RNN

$$ m_t=M_{\cdot,e_{t-1}} $$
$$ h_t\;=\;RNN\left(m_t,\;h_{t-1}\right) $$
$$ p_t\;=\;softmax\left(W_{hs}h_t\;+\;b_s\right) $$

The Vanishing Gradient and Long Short-term Memory

  • RNN problems
    • vanishing gradient
    • exploding gradient
      • gradient clip
    • why ?
      $$ \prod_{j=k+1}^t\frac{\partial h_t}{\partial h_{t-1}}=\prod_{j=k+1}^t\tan h’\;\times\;W_{hh}\;where\;\tan h’\leqslant1 $$
    • how to solve? -> LSTM (derivative of the recurrent function is exactly one)
      • add memory cell
        $$ \frac{\partial c_t}{\partial c_{t-1}}=\;1 $$
      • add two gates
        • input gate
        • output gate
      • key equation
        $$ c_t=i_t\odot u_t+c_{t-1} $$

Other RNN Variants

  • Add forget gate (easily clear its memory when justified)
    • problem
      • f is set to zero, then the model will forget everything.
        • solution: set bias b to be large value (e.g. 1)
        • idea: gradually forget during training
  • GRU

    • fewer parameters
    • no concept of “cell”
    • Recurrent highway networks
  • stack RNNs

    • recursive both on time and output ?
    • progressively extract more abstract feautures(e.g. POS -> voice/tense)
    • problem
      • vanishing gradient problem in the vertical direction
      • solution: Residual Network
        • idea: add output of previous lay to next layer

Online, Batch, and Minibatch Training

  • online learning( perform updates a single example at a time)
  • batch learning
  • Minibatching (medium between online and batch)
    • benefit
      • use efficient vector processing instructions
    • When doing batching, how to solve sentences of different sizes ?
      • sentence padding (add “eos” to shorter sentence)
      • masking
      • problem
        • waste a lot of computation on these padded symbols
        • solution
          • sort the sentences in the corpus by length
    • pytorch(match box)

Further Reading

Murmur

So busy recently ……. 😵

Share