LARS:层级自适应学习率缩放.
Lookahead:快权重更新k次,慢权重更新1次.
Radam:修正Adam算法中自适应学习率的早期方差.
Nadam:将Nesterov动量引入Adam算法.
Hook mechanism in Pytorch.
AMSGrad:改进Adam算法的收敛性.