MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining MaskCLIP:通过掩码自蒸馏提升对比语言-图像预训练.
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models BLIP-2:使用冻结图像编码器和大语言模型的引导式语言-图像预训练.