Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
LAMB:结合层级自适应学习率与Adam.
LAMB:结合层级自适应学习率与Adam.
Visualization methods of Convolutional Neural Networks.
LARS:层级自适应学习率缩放.
Lookahead:快权重更新k次,慢权重更新1次.
Radam:修正Adam算法中自适应学习率的早期方差.
Nadam:将Nesterov动量引入Adam算法.