LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference

LeViT:以卷积网络的形式进行快速推理的视觉Transformer.

Escaping the Big Data Paradigm with Compact Transformers

CCT:使用紧凑的Transformer避免大数据依赖.

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

T2T-ViT:在ImageNet上从头开始训练视觉Transformer.

Going deeper with Image Transformers

CaiT:更深的视觉Transformer.

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

ConvNeXt V2: 使用MAE协同设计和扩展卷积网络.

DeepViT: Towards Deeper Vision Transformer

DeepViT:构建更深的视觉Transformer.