The Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions

层归一化和动态激活函数之间的数学关系.

博物记:梅杏桃樱李梨海棠

Distinguish Flowers of Plum, Apricot, Peach, Cherry, Plum, Pear, and Crabapple.

Transformers without Normalization

无归一化的Transformer.

Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale

大规模卷积多混合语言模型的系统与算法.

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

SigLIP 2:使用改进的语义理解、定位和密集特征的多模态视觉语言编码器.

The Curse of Depth in Large Language Models

大语言模型中的深度诅咒.