CS224N lecture08 笔记

Machine Translation, Seq2Seq and Attention


  • Machine Translation early

    • 1950s: early, rule-based
    • 1990s-2010s: statistical Machine Translation
      • learn a probabilistic model form data
      • argmax_y P(y|x), x:origin language; y:target language
      • use Bayes Rule, break it down into two components to be learnt separately: argmax_y P(x|y)P(y)
        • P(x|y)~ Translation Model, Models how words and phrases should be translated(fidelity), learnt from parallel data;
        • P(y)~ Language Model, Models how to write good English(fluency), Learnt from monolingual data
  • Neural Machine Translation

    • NMT is a way to do MT with single neural network
    • The neural network architecture is called seq2seq involves two RNNs
    • Beam search: on each step of decoder, keep track of the k most probable partial tranlations, k is the beam size
    • continue beam search until: reach timestep T / have at least n completed hypotheses
  • Evaluate machine translation

    • BLEU(Bilingual Evaluation Understudy)
      • based on n-gram precision
      • plus a penalty for too-short system translations
  • Attention

    • Core idea: on each step of the decoder, use direct connection to the encoder to focus on a particular part
    • More general definition
      given a set of vector values and a vector query, attention is a technique to compute a weighted sum of the values, dependent on the query
    • some variants
      • Basic dot-product attention
      • Multiplicative attention
      • Additive attention


  • Keyphrases: Seq2Seq and Attention Mechanisms, Neural Machine Translation, Speech Processing

  • Original Seq2Seq

    • encoder
      process the input sequence in reverse
    • decoder
  • Attention Mechanism

  • Sequence model decoders

    • Exhaustive search
    • Ancestral sampling
    • Greedy search
    • Beam search
  • Evaluation of MT systems

    • BLEU
      • evaluate the presion score of a candidate machine translation against a reference human translation
      • For each n-gram, can not be matched more than once
      • impose a brevity penalty
  • Word segmentation

    • Byte Pair Encoding
    • Hybid NMT

