Submitted by LXT 45 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation · 6 authors 4
Submitted by JamesHujy 30 ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer · 9 authors 2
Submitted by kasraarabi 28 Hidden in the Noise: Two-Stage Robust Watermarking for Images · 5 authors 2
Submitted by xichenhku 28 UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics · 13 authors 4
Submitted by wanderkid 22 OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations · 20 authors 3
Submitted by myownskyW7 20 FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models · 9 authors 2
Submitted by lemonaddie 18 3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation · 10 authors 2
Submitted by shuaishuaicdp 17 Perception Tokens Enhance Visual Reasoning in Multimodal Language Models · 7 authors 2
Submitted by pvalois 16 Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation · 4 authors 4
Submitted by donaldssh 11 LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation · 5 authors 3
Submitted by chunwei0224 11 ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance · 8 authors 2
Submitted by renqiux0302 9 Chimera: Improving Generalist Model with Domain-Specific Experts · 14 authors 2
Submitted by aggr8 4 GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis · 6 authors 2
Submitted by alemiaschi 2 Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation · 6 authors 2
Submitted by thomasrantian 2 Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment · 6 authors 2
Submitted by gpx333 2 A New Federated Learning Framework Against Gradient Inversion Attacks · 7 authors 2