view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models By yuchenlin • Jul 27, 2024 • 31
IndicGenBench Collection Datasets released in "IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs" (https://arxiv.org/abs/2404.16816) • 4 items • Updated Dec 13, 2024 • 8
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 5 days ago • 53
VHELM: A Holistic Evaluation of Vision Language Models Paper • 2410.07112 • Published Oct 9, 2024 • 3
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 13 days ago • 133
view article Article Darija Chatbot Arena: Making LLMs Compete in the Moroccan Dialect By atlasia and 2 others • 13 days ago • 10
Leaderboards for Arabic Collection A collection for all leaderboards related to the Arabic Language. • 5 items • Updated 9 days ago • 2
view article Article Arabic RAG Leaderboard: A Comprehensive Framework for Evaluating Arabic Language Retrieval Systems By Navid-AI and 1 other • 14 days ago • 11