Continuously Differentiable Exponential Linear Units

CELU:连续可微的指数线性单元.

Mish: A Self Regularized Non-Monotonic Activation Function

Mish:一种自正则化的非单调激活函数.

泰勒公式(Taylor Formula)

Taylor Formula.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet:使用排列语言建模训练语言模型.

MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS:序列到序列的掩码语言建模.

Unified Language Model Pre-training for Natural Language Understanding and Generation

UniLM:使用BERT实现序列到序列的预训练.