DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12, 2024 • 15
Learning Multi-dimensional Human Preference for Text-to-Image Generation Paper • 2405.14705 • Published May 23, 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation Paper • 2406.10462 • Published Jun 15, 2024
Decouple Content and Motion for Conditional Image-to-Video Generation Paper • 2311.14294 • Published Nov 24, 2023
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization Paper • 2502.01051 • Published Feb 3
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published Feb 14 • 35
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning Paper • 2505.21067 • Published May 27 • 3
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types Paper • 2502.09925 • Published Feb 14
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models Paper • 2504.08809 • Published Apr 9