CS224N lecture08 笔记

发布于 2020-07-04  53 次阅读

Machine Translation, Seq2Seq and Attention


  • Machine Translation early

    • 1950s: early, rule-based
    • 1990s-2010s: statistical Machine Translation
      • learn a probabilistic model form data
      • argmax_y P(y|x), x:origin language; y:target language
      • use Bayes Rule, break it down into two components to be learnt separately: argmax_y P(x|y)P(y)
        • P(x|y)~ Translation Model, Models how words and phrases should be translated(fidelity), learnt from parallel data;
        • P(y)~ Language Model, Models how to write good English(fluency), Learnt from monolingual data
  • Neural Machine Translation

    • NMT is a way to do MT with single neural network
    • The neural network architecture is called seq2seq involves two RNNs
    • Beam search: on each step of decoder, keep track of the k most probable partial tranlations, k is the beam size
    • continue beam search until: reach timestep T / have at least n completed hypotheses
  • Evaluate machine translation

    • BLEU(Bilingual Evaluation Understudy)
      • based on n-gram precision
      • plus a penalty for too-short system translations
  • Attention

    • Core idea: on each step of the decoder, use direct connection to the encoder to focus on a particular part
    • More general definition
      given a set of vector values and a vector query, attention is a technique to compute a weighted sum of the values, dependent on the query
    • some variants
      • Basic dot-product attention
      • Multiplicative attention
      • Additive attention


  • Keyphrases: Seq2Seq and Attention Mechanisms, Neural Machine Translation, Speech Processing

  • Original Seq2Seq

    • encoder
      process the input sequence in reverse
    • decoder
  • Attention Mechanism

  • Sequence model decoders

    • Exhaustive search
    • Ancestral sampling
    • Greedy search
    • Beam search
  • Evaluation of MT systems

    • BLEU
      • evaluate the presion score of a candidate machine translation against a reference human translation
      • For each n-gram, can not be matched more than once
      • impose a brevity penalty
  • Word segmentation

    • Byte Pair Encoding
    • Hybid NMT

Suggested Readings

  1. Statistical Machine Translation Slides, CS224n 2015(Lectures 2/3/4)

  2. Statistical Machine Translation
    book by Philipp Koehn; 没看

  3. BLUE
    original paper

  4. Sequence to Sequence Learning with Neural Networks
    original seq2seq NMT paper

  5. Sequence Tranduction with Recurrent Neural Networks
    early seq2seq specch recognition paper

  6. Neural Machine Translation by jointly Learning to Align and Translate
    original seq2seq+attention paper

  7. Attention an Augmented Recurrent Neural Networks
    blog post overview; some tasks leverage attention; Adaptive Computation Time

  8. Massive Exploration of Neural Machine Translation Architectures
    a paper about practical advice for hyperparameter choices