微调 Grounding DINO 和 Label Studio 进行半自动化目标检测标注

Semiautomatic Image Annotation with Grounding DINO and Label Studio.

CoCa: Contrastive Captioners are Image-Text Foundation Models

CoCa:对比描述器是图像文本基础模型.

VinVL: Revisiting Visual Representations in Vision-Language Models

VinVL:重新回归视觉语言模型中的视觉表示.

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

SimVLM:弱监督的简单视觉语言模型预训练.

GIT: A Generative Image-to-text Transformer for Vision and Language

GIT:视觉和语言的通用图像到文本Transformer.

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

VLMo:使用模态混合专家的统一视觉语言预训练.