Implicit Gradient Regularization

隐式梯度正则化.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

大批量分布式训练的线性缩放规则和warmup.

Multimodal Machine Learning: A Survey and Taxonomy

一篇关于多模态机器学习的综述(表示、转换、对齐、融合与协同学习).

Learning Continuous Image Representation with Local Implicit Image Function

LIIF:学习2D图像的连续表达形式.

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

AdaX:基于指数长期记忆的自适应梯度下降.

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor:减少Adam的显存占用.