Why gradient clipping accelerates training: A theoretical justification for adaptivity

为什么梯度裁剪能够加速训练:适应性的理论依据.

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

变分判别瓶颈:通过约束信息流改进深度学习模型.

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet:仅使用加法运算的卷积神经网络.

使用Matplotlib绘制训练曲线

Draw training curves via Matplotlib.

Deep Variational Information Bottleneck

深度变分信息瓶颈.

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

虚拟对抗训练:一种用于监督学习和半监督学习的正则化方法.