On the Variance of the Adaptive Learning Rate and Beyond

Radam:修正Adam算法中自适应学习率的早期方差.

Incorporating Nesterov Momentum into Adam

Nadam:将Nesterov动量引入Adam算法.

Pytorch中的Hook机制

Hook mechanism in Pytorch.

On the Convergence of Adam and Beyond

AMSGrad:改进Adam算法的收敛性.

Adam: A Method for Stochastic Optimization

Adam:自适应矩估计.

On the importance of initialization and momentum in deep learning

Nesterov Momentum:一种动量梯度更新方法.