Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation Paper • 2508.20470 • Published 11 days ago • 64
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 76
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10 • 89
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Paper • 2503.06053 • Published Mar 8 • 138
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 26
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline Paper • 2411.12814 • Published Nov 19, 2024 • 26
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations Paper • 2411.10818 • Published Nov 16, 2024 • 27
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20, 2024 • 35
Style-Friendly SNR Sampler for Style-Driven Generation Paper • 2411.14793 • Published Nov 22, 2024 • 40
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22, 2024 • 51
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 51
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 37
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Paper • 2411.18478 • Published Nov 27, 2024 • 38
Bringing Objects to Life: 4D generation from 3D objects Paper • 2412.20422 • Published Dec 29, 2024 • 42
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published Dec 16, 2024 • 44
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment Paper • 2412.04814 • Published Dec 6, 2024 • 48