-
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Paper • 2408.07199 • Published • 21 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 76 -
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Paper • 2406.12050 • Published • 19 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10
Collections
Discover the best community collections!
Collections including paper arxiv:2408.06195
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 22 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 115
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 21 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 13 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 54 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 257 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 90 -
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Paper • 2410.02884 • Published • 54 -
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
Paper • 2311.07445 • Published
-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 107 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 47 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 27
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 70 -
Thinking LLMs: General Instruction Following with Thought Generation
Paper • 2410.10630 • Published • 19 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 15 -
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
Paper • 2412.16849 • Published • 9
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 70 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 137 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 55 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3