CS224N lecture06 笔记

发布于 2020-06-20  158 次阅读

The probability of a sentence? Recurrent Neural Networks and Language Models


  • Language Model
    the task of predicting what word comes next; assigns probability to a piece of text

  • n-gram Language Model(pre-Deep learning)

    • A n-gram is a chunk of n consecutive words
    • to handle sparse problem- smoothing; back off;
  • neural Language Model

    • a fixed-window neural Language Model(like the NER model in lecture 03)
    • RNN(apply the same weights W repeatedly, symmerty)
  • Evaluating Language Models

    • the standard evaluation metric-perplexity: equal to the exponential of the cross-entropy loss exp(J(Θ)), lower perplexity is better
  • Recurrent Neural Network

    • take sequential input of any length
    • apply the same weights on each step
    • can optionally produce output on each step


  • Keyphrases: Language Models; RNN; Bi-directional RNN; GRU; LSTM; Deep RNN

  • n-gram Language models

    • sparsity problems: use smoothing; use backoff; increase n → sparsity worse, typically n <= 5

    • storage problems: increase n → model size increase

  • Window-based Neural Language Model
    example by Bengio (经典论文)

  • Recurrent Neural Network(RNN)

    • RNN loss and perplexity, 公式表达

Suggested Readings

  1. N-gram Language Models(textbook chapter)

    • extrinsic evaluation(end to end)
    • perplexity
    • out of vocabulary(OOV)
      open vocabulary —— <UNK> for unknown words; a LM can achieve low perplexity by choosing a small vocabulary and assigning the unknown word a high probalility
    • Smoothing
      • Laplace smoothing(add one smoothing); discounted probability
      • Add-k smoothing
      • backoff
      • interpolation
        mix the probability estimates from all the n-gram estimators
      • Katz backoff
        rely on the discounted probability in order not to beyond 1; often combined with a smoothing method called "Good-Turing"
      • Kneser-Ney smoothing(most commonly)
        assume that words appear in more context tend to appear more in new context
  2. The Unreasonable Effectiveness of Recurrent Neural Networks
    blog post overview; RNN对文本结构的捕获能力令人印象深刻,latex格式,linux源码....

  3. Sequence Modeling: Recurrent and Recursive Neural Nets(Section 10.1 and 10.2)
    关于上一个状态RNN传入下一个hidden state形式的几种变体, 没细看

  4. On Chomsky and the Two Cultures of Statistical Learning