Running 112 112 Open-LLM performances are plateauing, let’s make the leaderboard steep again 🏔 Update leaderboard for fair model evaluation