MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

MoE-Mamba: 通过混合专家实现高效选择状态空间模型.

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

H3:使用状态空间模型进行语言建模.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba:具有选择性状态空间的线性时间序列建模.

On the Parameterization and Initialization of Diagonal State Space Models

对角状态空间模型的参数化和初始化.

Simplified State Space Layers for Sequence Modeling

简化序列建模的状态空间层.

Diagonal State Spaces are as Effective as Structured State Spaces

对角状态空间和结构化状态空间一样有效.