郑之杰的个人博客
欢迎光临
Mish: A Self Regularized Non-Monotonic Activation Function
Mish:一种自正则化的非单调激活函数.
泰勒公式(Taylor Formula)
Taylor Formula.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet:使用排列语言建模训练语言模型.
MASS: Masked Sequence to Sequence Pre-training for Language Generation
MASS:序列到序列的掩码语言建模.
Unified Language Model Pre-training for Natural Language Understanding and Generation
UniLM:使用BERT实现序列到序列的预训练.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa:鲁棒优化的BERT预训练方法.