SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features SigLIP 2:使用改进的语义理解、定位和密集特征的多模态视觉语言编码器.