SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 99
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior Paper • 2310.16818 • Published Oct 25, 2023 • 32
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling Paper • 2501.17811 • Published 25 days ago • 6
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
Llasa Collection TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 11 items • Updated 2 days ago • 6
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 3 items • Updated 6 days ago • 26
OLMoE (January 2025) Collection Improved OLMoE for iOS app. Read more: https://allenai.org/blog/olmoe-app • 10 items • Updated 12 days ago • 9
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 8 items • Updated 6 days ago • 51
Granite Experiments Collection Experimental projects under consideration for the Granite family. • 6 items • Updated 4 days ago • 9
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 5 items • Updated 17 days ago • 49
QLIP Collection QLIP is a family of image tokenizers with SOTA reconstruction quality and zero-shot image understanding. • 3 items • Updated 17 days ago • 7
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated 11 days ago • 90