Learning Transferable Visual Models From Natural Language Supervision

DALL·E:从文本生成图像.

Long-tail learning via logit adjustment

Logit Adjustment Loss: 将类别出现频率引入logits.

On the Relationship between Self-Attention and Convolutional Layers

理解自注意力和卷积层的关系.

Improving Language Understanding by Generative Pre-Training

GPT:使用生成式预训练模型提高对语言的理解.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT:从Transformer中获得上下文的编码表示.

Deep contextualized word representations

ELMo:使用语言模型进行词嵌入.