Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.01100

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 106
PaSa: An LLM Agent for Comprehensive Academic Paper Search

Paper • 2501.10120 • Published Jan 17 • 43
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Paper • 2501.09775 • Published Jan 16 • 29
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

Paper • 2501.10132 • Published Jan 17 • 19

Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?

Paper • 2502.00674 • Published 22 days ago • 12
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 18 days ago • 51
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 19 days ago • 190
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper • 2502.01142 • Published 21 days ago • 23

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Paper • 2502.00698 • Published 22 days ago • 23
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper • 2502.01142 • Published 21 days ago • 23
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published 21 days ago • 15
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published 21 days ago • 14

OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

Paper • 2501.09751 • Published Jan 16 • 47
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16 • 36
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 330
s1: Simple test-time scaling

Paper • 2501.19393 • Published 23 days ago • 105

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 192
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs