RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa:鲁棒优化的BERT预训练方法.

Efficient Attention: Attention with Linear Complexities

具有线性复杂度的高效自注意力机制.

Longformer: The Long-Document Transformer

Longformer: 适用于长文本的Transformer.

Linformer: Self-Attention with Linear Complexity

Linformer: 线性复杂度的自注意力机制.

Rethinking Attention with Performers

Performer: 通过随机投影将Attention的复杂度线性化.

Reformer: The Efficient Transformer

Reformer: 使用局部敏感哈希和可逆FFN实现高效Transformer.