EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published 10 days ago • 72
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published 10 days ago • 85
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Paper • 2508.20072 • Published 11 days ago • 28
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper • 2508.13998 • Published 19 days ago • 16
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published 19 days ago • 57
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published 27 days ago • 41
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published Jul 23 • 36
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Paper • 2503.21781 • Published Mar 27
MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching Paper • 2502.13234 • Published Feb 18
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 38
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 38
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 38 • 1
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Paper • 2507.13428 • Published Jul 17 • 15
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17 • 41
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 73