Benchmarks - a aishiknagar Collection

aishiknagar 's Collections

Predictive and Classification tasks

LLMs foe evaluation and Judge models

Analysis papers

Positions and Surveys

Benchmarks

updated May 27

FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

Paper • 2505.17399 • Published May 23 • 14
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published May 26 • 44