Towards Reliable Testing for Multiple Information Retrieval System Comparisons Paper • 2501.03930 • Published Jan 7
Limitations of Automatic Relevance Assessments with Large Language Models for Fair and Reliable Retrieval Evaluation Paper • 2411.13212 • Published Nov 20, 2024
How Discriminative Are Your Qrels? How To Study the Statistical Significance of Document Adjudication Methods Paper • 2308.09340 • Published Aug 18, 2023 • 1