An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

ViT:使用图像块序列的Transformer进行图像分类.

Generative Pretraining from Pixels

iGPT:像素级的图像预训练模型.

Do We Need Zero Training Loss After Achieving Zero Training Error?

Flooding:避免训练损失为0.

REALM: Retrieval-Augmented Language Model Pre-Training

REALM:通过检索增强预训练语言模型.

OneNet: Towards End-to-End One-Stage Object Detection

OneNet:无需NMS的One-stage端到端目标检测方法.

Implicit Gradient Regularization

隐式梯度正则化.