Machine Translation, Seq2Seq and Attention
Slides
-
Machine Translation early
- 1950s: early, rule-based
- 1990s-2010s: statistical Machine Translation
- learn a probabilistic model form data
- argmax_y P(y|x), x:origin language; y:target language
- use Bayes Rule, break it down into two components to be learnt separately: argmax_y P(x|y)P(y)
- P(x|y)~ Translation Model, Models how words and phrases should be translated(fidelity), learnt from parallel data;
- P(y)~ Language Model, Models how to write good English(fluency), Learnt from monolingual data
-
Neural Machine Translation
- NMT is a way to do MT with single neural network
- The neural network architecture is called seq2seq involves two RNNs
- Beam search: on each step of decoder, keep track of the k most probable partial tranlations, k is the beam size
- continue beam search until: reach timestep T / have at least n completed hypotheses
-
Evaluate machine translation
- BLEU(Bilingual Evaluation Understudy)
- based on n-gram precision
- plus a penalty for too-short system translations
- BLEU(Bilingual Evaluation Understudy)
-
Attention
- Core idea: on each step of the decoder, use direct connection to the encoder to focus on a particular part
- More general definition
given a set of vector values and a vector query, attention is a technique to compute a weighted sum of the values, dependent on the query - some variants
- Basic dot-product attention
- Multiplicative attention
- Additive attention
Note
-
Keyphrases: Seq2Seq and Attention Mechanisms, Neural Machine Translation, Speech Processing
-
Original Seq2Seq
- encoder
process the input sequence in reverse - decoder
- encoder
-
Attention Mechanism
-
Sequence model decoders
- Exhaustive search
- Ancestral sampling
- Greedy search
- Beam search
-
Evaluation of MT systems
- BLEU
- evaluate the presion score of a candidate machine translation against a reference human translation
- For each n-gram, can not be matched more than once
- impose a brevity penalty
- BLEU
-
Word segmentation
- Byte Pair Encoding
- Hybid NMT
Suggested Readings
-
Statistical Machine Translation Slides, CS224n 2015(Lectures 2/3/4)
没看 -
Statistical Machine Translation
book by Philipp Koehn; 没看 -
BLUE
original paper -
Sequence to Sequence Learning with Neural Networks
original seq2seq NMT paper -
Sequence Tranduction with Recurrent Neural Networks
early seq2seq specch recognition paper -
Neural Machine Translation by jointly Learning to Align and Translate
original seq2seq+attention paper -
Attention an Augmented Recurrent Neural Networks
blog post overview; some tasks leverage attention; Adaptive Computation Time -
Massive Exploration of Neural Machine Translation Architectures
a paper about practical advice for hyperparameter choices
Comments | NOTHING