BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP:引导式语言-图像预训练实现统一的视觉-语言理解和生成.

GLIPv2: Unifying Localization and Vision-Language Understanding

GLIPv2:统一定位和视觉语言理解.

(河北篇)保定:直隶故都文脉厚,驴肉火烧滋味长

(Hebei Chapter) Baoding: Openning the Door to Capital.

微调 Grounding DINO 和 Label Studio 进行半自动化目标检测标注

Semiautomatic Image Annotation with Grounding DINO and Label Studio.

CoCa: Contrastive Captioners are Image-Text Foundation Models

CoCa:对比描述器是图像文本基础模型.

VinVL: Revisiting Visual Representations in Vision-Language Models

VinVL:重新回归视觉语言模型中的视觉表示.