MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 3 days ago • 147
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 99
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published 3 days ago • 13
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 3 days ago • 87
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 6 days ago • 87
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 5 days ago • 53
Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge Paper • 2502.12501 • Published 6 days ago • 5
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation Paper • 2502.09838 • Published 10 days ago • 9
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm Paper • 2502.12513 • Published 5 days ago • 15
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Paper • 2502.13145 • Published 5 days ago • 35
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models Paper • 2502.12464 • Published 6 days ago • 26
Rethinking Diverse Human Preference Learning through Principal Component Analysis Paper • 2502.13131 • Published 5 days ago • 34
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Paper • 2502.13143 • Published 5 days ago • 28
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published 7 days ago • 49