-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 13 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 54 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2408.14354
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 16 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 63 -
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Paper • 2407.18901 • Published • 33 -
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Paper • 2408.07060 • Published • 42
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 22 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 82 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 35