Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Linear Transformer: 使用线性注意力实现快速自回归的Transformer.

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

External Attention: 使用两个外部记忆单元的注意力机制.

二进制乘法的Mitchell近似

使用Mitchell近似构造加法神经网络.

ResMLP: Feedforward networks for image classification with data-efficient training

ResMLP:数据高效训练的全连接图像分类网络.

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

使用全连接层替换ViT中的自注意力层.

MLP-Mixer: An all-MLP Architecture for Vision

MLP-Mixer:一种全连接层结构的视觉模型.