UNITER: UNiversal Image-TExt Representation Learning

UNITER:通用图像-文本表示学习.

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Oscar:视觉-语言任务的目标语义对齐预训练.

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

VL-BERT:通用视觉-语言表示的预训练.

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

ViLBERT:用于视觉和语言任务的无任务特定的视觉语言表示的预训练.

VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT:一个简单有效的视觉语言基线.

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

LXMERT:学习Transformer中的跨模态编码表示.