Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

大批量分布式训练的线性缩放规则和warmup.

Multimodal Machine Learning: A Survey and Taxonomy

一篇关于多模态机器学习的综述(表示、转换、对齐、融合与协同学习).

Learning Continuous Image Representation with Local Implicit Image Function

LIIF:学习2D图像的连续表达形式.

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

AdaX:基于指数长期记忆的自适应梯度下降.

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor:减少Adam的显存占用.

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

NovoGrad:使用层级自适应二阶矩进行梯度归一化.