BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT:从Transformer中获得上下文的编码表示.
Deformable DETR: Deformable Transformers for End-to-End Object Detection Deformable DETR:使用多尺度可变形的注意力模块进行目标检测.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale ViT:使用图像块序列的Transformer进行图像分类.