ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

ImageBERT:使用大规模弱监督图像文本数据进行跨模态预训练.

浅评《回声》:衍生的衍生,难有回声

A Brief Review of Echo: Derived Derivatives, Difficult to Echo.

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

Pixel-BERT:使用深度多模态Transformer对齐图像像素和文本.

UNITER: UNiversal Image-TExt Representation Learning

UNITER:通用图像-文本表示学习.

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

Oscar:视觉-语言任务的目标语义对齐预训练.

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

VL-BERT:通用视觉-语言表示的预训练.