--- license: apache-2.0 tags: - vision --- # SigLIP 2 Base [SigLIP 2](https://huggingface.co/collections/google/siglip2-67b5dcef38c175486e240107) extends the pretraining objective of [SigLIP](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba) with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features. ## Intended uses You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks). ## Training procedure SigLIP 2 adds some clever training objectives on top of SigLIP: 1. Decoder loss 2. Global-local and masked prediction loss 3. Aspect ratio and resolution adaptibility ### Training data SigLIP 2 is pre-trained on the WebLI dataset [(Chen et al., 2023)](https://arxiv.org/abs/2209.06794). ### Compute The model was trained on up to 2048 TPU-v5e chips. ## Evaluation results Evaluation of SigLIP 2 is shown below (taken from the paper). [Evaluation Table](TODO) ### BibTeX entry and citation info ```bibtex TODO ```