郑之杰的个人博客
欢迎光临
Memory-Efficient Adaptive Optimization
SM3:内存高效的自适应优化算法.
Averaging Weights Leads to Wider Optima and Better Generalization
SWA:通过随机权重平均寻找更宽的极小值.
Decoupled Weight Decay Regularization
AdamW:解耦梯度下降与权重衰减正则化.
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
ULSAM:超轻量级子空间注意力机制.
On Layer Normalization in the Transformer Architecture
Transformer结构中的层归一化.
A^2-Nets: Double Attention Networks
A^2-Net:双重注意力网络.