SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 99
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? Paper • 2502.13233 • Published 5 days ago • 11
Baichuan-M1: Pushing the Medical Capability of Large Language Models Paper • 2502.12671 • Published 5 days ago • 1
Scaling Test-Time Compute Without Verification or RL is Suboptimal Paper • 2502.12118 • Published 6 days ago • 1
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 5 days ago • 72
Is Noise Conditioning Necessary for Denoising Generative Models? Paper • 2502.13129 • Published 5 days ago • 1
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario Paper • 2501.10132 • Published Jan 17 • 19
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 83
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 25 days ago • 55
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 19 days ago • 190
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 13 days ago • 134
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 12 days ago • 27
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 11 days ago • 139
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 9 days ago • 29