- reference
Adapting sequence-to-sequence models to a particular type of problem.
Ensembling
$$ P(E\;\vert\;F)\;=\;\lambda P_1(E\;\vert\;F)\;+\;(1-\lambda)P_2(E\;\vert\;F) $$
Multi-task Learning
- Multi-task Loss Functionss
$$ l(C1,\;C2)\;=\lambda_1l_1(C1)\;+\;\lambda_2l_2(C2).\; $$ - Task Labels
- adding domain-specific features
- Multi-task Loss Functionss
Transfer Learning
- Continued Training
- Data Selection
- log-likelihood differential
$$ diff(E)=log P_{in}(E)\;-\;log P_{gen}(E). $$ (in: in-domain data, gen: general-domain data)
- log-likelihood differential