李宏毅机器学习笔记(1)

 

Optimization

Adam

Batch Normalization

How Does Batch Normalization Help Optimization?

Self-Attention

Self-Attention vs Convolutional Layers

On the Relationship between Self-Attention and Convolutional Layers

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

GAN

Circle GAN

Star GAN

https://openai.com/blog/generative-models/

BERT

ELMO

Bert

GTP-2