Incorporating Convolution Designs into Visual Transformers

CeiT:将卷积设计整合到视觉Transformers中.

CvT: Introducing Convolutions to Vision Transformers

CvT:向视觉Transformer中引入卷积.

Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer:逐像素分类并不是语义分割所必需的.

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

PVT:一种无卷积密集预测的通用骨干.

Segmenter: Transformer for Semantic Segmentation

Segmenter:为语义分割设计的视觉Transformer.

Rethinking Spatial Dimensions of Vision Transformers

PiT:重新思考视觉Transformer的空间维度.