Simple Hardware-Efficient Long Convolutions for Sequence Modeling
用于序列建模的简单的硬件高效长卷积.
用于序列建模的简单的硬件高效长卷积.
LaughingHyena: 从卷积中提取紧凑循环.
Hyena:面向大型卷积语言模型.
Large Language Model.
(Shanghai Chapter) Shanghai: Shanghai Tap Water Comes From the Sea.
Mix-LN:通过结合Pre-LN与Post-LN释放深层网络的能力.