Long-tail learning via logit adjustment

Logit Adjustment Loss: 将类别出现频率引入logits.

On the Relationship between Self-Attention and Convolutional Layers

理解自注意力和卷积层的关系.

Improving Language Understanding by Generative Pre-Training

GPT:使用生成式预训练模型提高对语言的理解.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT:从Transformer中获得上下文的编码表示.

Deep contextualized word representations

ELMo:使用语言模型进行词嵌入.

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Deformable DETR:使用多尺度可变形的注意力模块进行目标检测.