Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8 • 73
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • Jul 9 • 667
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks Paper • 2404.00376 • Published Mar 30, 2024 • 5
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Paper • 2506.21355 • Published Jun 26 • 9
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning Paper • 2506.22992 • Published Jun 28 • 12
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models Paper • 2401.15269 • Published Jan 27, 2024 • 2
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards Paper • 2506.11474 • Published Jun 13 • 18
MIRIAD: Augmenting LLMs with millions of medical query-response pairs Paper • 2506.06091 • Published Jun 6 • 9
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science Paper • 2402.04247 • Published Feb 6, 2024 • 2
ChatCell: Facilitating Single-Cell Analysis with Natural Language Paper • 2402.08303 • Published Feb 13, 2024 • 14
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 74
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published Jan 11 • 11
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 86
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3 • 27
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10 • 16