NN is a good Universal Function Approximator.
Potential and Problems with Combination Features
- separate feauture set
- can not rule out unnatural phase
- combining feature set
- expand the parameters
- corresponding algorithms
- kernalized svm
- neural networks
Overview
- MLP
- Space mapping
Train NN
Non-linear function
- ReLU
- Tanh (saturated)
How to do complicated derivation on computer ?
- Automatic differentiation (autodiff)
- Article
- [1991] Andreas Griewank. Automatic differentiation of algorithms: theory, implementation, and application.
- [1964] A simple automatic derivative evaluation program.
- Computation graph
- Two-step dynamic programming algorithm:
- Forward calculation
- Back propagation
- Technical detail
- Article
- Automatic differentiation (autodiff)
Toolbox
- Chainer
- DyNet
- MxNet
- PyTorch
- TensorFlow
- Theano
Why good?
- Better generalization of contexts
- More generalizable combination of words into contexts
- Ability to skip previous words
Further Reading
Softmax approximations
Other softmax structures
- Class-based Softmax
$$ O\left(\left|C\right|\;+\;\left|V_{ct}\right|\right) $$ - Hierachical Softmax
$$ O\left(\log_2\left|V\right|\right) $$
- Class-based Softmax
Other models to learn word representations
- continuous-bag-of-words
- skip-gram models