Large Batch Training of Convolutional Networks

LARS:层级自适应学习率缩放.

Lookahead Optimizer: k steps forward, 1 step back

Lookahead:快权重更新k次,慢权重更新1次.

On the Variance of the Adaptive Learning Rate and Beyond

Radam:修正Adam算法中自适应学习率的早期方差.

Incorporating Nesterov Momentum into Adam

Nadam:将Nesterov动量引入Adam算法.

Pytorch中的Hook机制

Hook mechanism in Pytorch.

On the Convergence of Adam and Beyond

AMSGrad:改进Adam算法的收敛性.