PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles Paper • 2510.06475 • Published Oct 7, 2025 • 1 • 2
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering Paper • 2510.06426 • Published Oct 7, 2025 • 2 • 2