Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

图像作为外语:所有视觉和视觉-语言任务的BEiT预训练.

Improving CLIP Training with Language Rewrites

通过语言重写改进CLIP训练.

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

利用噪声文本监督扩大视觉语言表示学习.

Attentive Mask CLIP

注意力掩码对比语言-图像预训练.

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

MaskCLIP:通过掩码自蒸馏提升对比语言-图像预训练.

Scaling Language-Image Pre-training via Masking

通过掩码提升语言-图像预训练效率.