Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions LaughingHyena: 从卷积中提取紧凑循环.
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Mix-LN:通过结合Pre-LN与Post-LN释放深层网络的能力.