MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 25
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 99
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 97
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 49
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 273
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 36
PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published Jan 17 • 43
GuardReasoner: Towards Reasoning-based LLM Safeguards Paper • 2501.18492 • Published 24 days ago • 81
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 24 days ago • 19
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 13 days ago • 133
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 13 days ago • 122
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 12 days ago • 42
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper • 2502.07374 • Published 12 days ago • 33
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance Paper • 2502.08127 • Published 11 days ago • 49
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published 12 days ago • 49
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 12 days ago • 43
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published 9 days ago • 49
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published 6 days ago • 41
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 3 days ago • 145
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 97
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 3 days ago • 87