VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT:一个简单有效的视觉语言基线.

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

LXMERT:学习Transformer中的跨模态编码表示.

视觉-语言预训练(Vision-Language Pretraining)

Vision-Language Pretraining.

Analyzing and Improving the Training Dynamics of Diffusion Models

分析和改进扩散模型的训练动力学.

(黑龙江篇)哈尔滨:冰城雪砌琼楼景,松水波摇尔滨情

(Heilongjiang Chapter) Harbin: Ice and Snow World.

VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

VLCounter:零样本目标计数的文本感知视觉表示.