GLIPv2: Unifying Localization and Vision-Language Understanding

GLIPv2:统一定位和视觉语言理解.

(河北篇)保定:直隶故都文脉厚,驴肉火烧滋味长

(Hebei Chapter) Baoding: Openning the Door to Capital.

微调 Grounding DINO 和 Label Studio 进行半自动化目标检测标注

Semiautomatic Image Annotation with Grounding DINO and Label Studio.

CoCa: Contrastive Captioners are Image-Text Foundation Models

CoCa:对比描述器是图像文本基础模型.

VinVL: Revisiting Visual Representations in Vision-Language Models

VinVL:重新回归视觉语言模型中的视觉表示.

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

SimVLM:弱监督的简单视觉语言模型预训练.