FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games Paper • 2509.01052 • Published 5 days ago • 18 • 1
ChartCap: Mitigating Hallucination of Dense Chart Captioning Paper • 2508.03164 • Published Aug 5 • 6 • 2
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates Paper • 2505.22943 • Published May 28 • 4 • 4
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates Paper • 2505.22943 • Published May 28 • 4 • 4
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates Paper • 2505.22943 • Published May 28 • 4 • 4