CS224N lecture04 笔记

发布于 2020-06-02  47 次阅读

Backpropagation and Compution Graphs


  • When to fine-tune(word embedding)?
    small training data set ×;large dateset,train=update √
  • Computation Graphs and Backpropagation
    • Source nodes:inputs
    • Interior nodes:operations
    • Edages pass alone result of the operation
    • [downstream gradient] = [upstream gradient] × [local gradient]
  • Regularization
    over all parameters Θ(except bias)
  • Vectorization
    always use vectors and matrices rather than for loops
  • Non-linearities
    for building a feed-forward deep network,try ReLU at first since it trains quickly and peforms well due to good gradient backflow
  • Parameter Initialization
    • biases → 0;mean target or inverse sigmoid of mean target
    • all other weights ~ Uniform(-r,r)
    • Xaiver initialization(about the numbers of parameter in Xin and Xout)
  • Optimizers


just the same as notes in Lecture 03

Suggested Readings

  1. Derivates, Backpropagation,and Vectorization
    • Gradient:vector in,scalar out
    • Jacobian:vector in, vector out
    • Generalized Jacobian:tensor in, tensor out
  2. CS 231n notes on network architectures
    not available
  3. Learning Representations by Backpropagation
    early paper about bp by Hinton
  4. Yes you should understand backprop
    a blog