Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision 利用噪声文本监督扩大视觉语言表示学习.
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining MaskCLIP:通过掩码自蒸馏提升对比语言-图像预训练.