MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

MaskCLIP:通过掩码自蒸馏提升对比语言-图像预训练.

Scaling Language-Image Pre-training via Masking

通过掩码提升语言-图像预训练效率.

SLIP: Self-supervision meets Language-Image Pre-training

SLIP:自监督语言图像预训练.

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

Chinese CLIP:中文对比视觉语言预训练.

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

BLIP-2:使用冻结图像编码器和大语言模型的引导式语言-图像预训练.

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP:引导式语言-图像预训练实现统一的视觉-语言理解和生成.