Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 2 days ago • 46
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 5 items • Updated 17 days ago • 49
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 18 days ago • 188
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • 24 days ago • 31
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 99
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 78
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 3 days ago • 239
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8, 2024 • 39
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 125