Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks NovoGrad:使用层级自适应二阶矩进行梯度归一化.