Video generation models as world simulators

视频生成模型作为世界模拟器.

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment

DetCLIPv2:通过词汇-区域对齐实现可扩展开放词汇目标检测预训练.

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection

DetCLIP:用于开放世界检测的字典增强视觉概念并行预训练.

Learning Object-Language Alignments for Open-Vocabulary Object Detection

为开放词汇目标检测学习目标-语言对齐.

RegionCLIP: Region-based Language-Image Pretraining

RegionCLIP:基于区域的语言图像预训练.

Exploiting Unlabeled Data with Vision and Language Models for Object Detection

通过视觉和语言模型探索目标检测中的无标签数据.