SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 98
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 17 days ago • 26
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 12 days ago • 43
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published 11 days ago • 49
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 19 days ago • 190
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 3 days ago • 239
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • 23 days ago • 36
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27, 2024 • 26
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 329
view article Article PEFT: Parameter-Efficient Fine-Tuning Methods for LLMs By samuellimabraz • 30 days ago • 12
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 77