SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 97
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 7 days ago • 133
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 19 days ago • 190
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 23 days ago • 37
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 26 days ago • 35
Temporal Preference Optimization Collection Temporal Preference Optimization for Long-form Video Understanding • 3 items • Updated Jan 19 • 4
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published about 1 month ago • 22
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 329
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 83
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published Jan 16 • 25