CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning Paper • 2401.14011 • Published Jan 25, 2024 • 1
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs Paper • 2505.11842 • Published May 17 • 2
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14 • 164
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench Paper • 2510.26865 • Published Oct 30 • 11