CS224N lecture11 笔记

发布于 2020-07-12  692 次阅读

ConvNets for NLP


  • CNN

    • 1 d discrete convolution generally (f*g)[n] = ∑f[n-m]g[m]
      • padding
      • max-pooling over time
      • k-max pooling overtime(keep the order in the original input)
      • stride
      • dilation
  • Single Layer CNN for Sentence Classification

    • a simple use of one convolutional layer and pooling
  • Batch Normlization(BatchNorm)

    • often used in CNNs
    • transform the convolution output of a batch by scaling the activations to have zero mean and unit variance
    • "This is the familiar Z-transform of statistics"
    • But updated per batch so fluctuation do not affect things much
    • use of BatchNorm makes models much less sensitive to parameter initialization, since outputs are automatically rescaled
    • It also tends to make tunning of learning rates simpler
  • 1x1 Convolutions(Network-in-Network, NiN)

  • Quasi-Recurrent Neural Network
    combine RNN and CNN


  • Multiple-channels
    static(no gradient flow) and dynamic in word vector

  • CNN options

    • Narrow vs Wide (whether contains zero padding)

Suggested Readings

  1. Convolutional Neural Networks for Sentence Classification
    the simple CNN introduced in this Lecture; mutichannel for word vectors(fine-tuned or not)

  2. A Convolutional Neural Network for Modeling Sentences
    dynamic k-max pooling(keep the relative order); folding operation in CNN;