XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet:使用排列语言建模训练语言模型.

MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS:序列到序列的掩码语言建模.

Unified Language Model Pre-training for Natural Language Understanding and Generation

UniLM:使用BERT实现序列到序列的预训练.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa:鲁棒优化的BERT预训练方法.

Efficient Attention: Attention with Linear Complexities

具有线性复杂度的高效自注意力机制.

Longformer: The Long-Document Transformer

Longformer: 适用于长文本的Transformer.