Gemstones: A Model Suite for Multi-Faceted Scaling Laws Paper • 2502.06857 • Published 16 days ago • 23
LiveBench: A Challenging, Contamination-Free LLM Benchmark Paper • 2406.19314 • Published Jun 27, 2024 • 23
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models Paper • 2306.13651 • Published Jun 23, 2023 • 15
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models Paper • 2306.13651 • Published Jun 23, 2023 • 15
On the Reliability of Watermarks for Large Language Models Paper • 2306.04634 • Published Jun 7, 2023 • 5