ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Paper • 2511.14349 • Published Nov 18 • 17
ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation Paper • 2509.12618 • Published Sep 16 • 1
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22 • 29