SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 3 days ago • 88
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 115
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation Paper • 2410.21157 • Published Oct 28, 2024 • 6
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published Sep 5, 2024 • 26
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published Sep 3, 2024 • 45
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering Paper • 2408.09174 • Published Aug 17, 2024 • 52
Running on CPU Upgrade 12.6k 12.6k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots