-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 106 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 43 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 29 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2502.01100
-
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?
Paper • 2502.00674 • Published • 12 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 51 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 190 -
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper • 2502.01142 • Published • 23
-
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Paper • 2502.00698 • Published • 23 -
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper • 2502.01142 • Published • 23 -
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Paper • 2502.01100 • Published • 15 -
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper • 2502.01081 • Published • 14
-
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Paper • 2501.09751 • Published • 47 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 36 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 330 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 105
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 35