CS224N lecture14 笔记

发布于 16 天前  23 次阅读


Transformers and Self-Attention

Slides

  • Self-Attention
    • Constant "path length" between any two positions
    • Unbounded memory
    • Trival to parallelize(per layer)
    • Models self-similarity
    • Relative attention provides expressive timing, equivariance, and extends naturally to graphs

Suggested Readings

  1. Image Transformer
  2. Music Transformer Generating music with long-term structure

阿克西斯上没有什么重要的东西