Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control 20 days ago • 106
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control Paper • 2405.04798 • Published May 8, 2024 • 1
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 12 days ago • 44
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 68
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 19 days ago • 190
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 26 days ago • 26
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published 24 days ago • 27
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • Jan 23 • 63
view article Article Fine-tune ModernBERT for RAG with Synthetic Data By sdiazlor and 2 others • Jan 20 • 36
Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI Paper • 2409.14160 • Published Sep 21, 2024 • 2