OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28 • 71
OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation Paper • 2510.26213 • Published Oct 30 • 9
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition Paper • 2510.01068 • Published Oct 1 • 19
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1 • 39
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition Paper • 2404.15254 • Published Apr 23, 2024 • 1
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation Paper • 2409.03643 • Published Sep 5, 2024 • 19
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs Paper • 2507.11097 • Published Jul 15 • 64
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published May 26 • 104
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 144
LEGION: Learning to Ground and Explain for Synthetic Image Detection Paper • 2503.15264 • Published Mar 19 • 21
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation Paper • 2409.03643 • Published Sep 5, 2024 • 19