SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation Paper • 2502.13128 • Published 5 days ago • 34
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Paper • 2502.07870 • Published 12 days ago • 42
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices Paper • 2502.04363 • Published 18 days ago • 11
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper • 2501.12909 • Published Jan 22 • 68
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla • Jan 20 • 61
FashionComposer: Compositional Fashion Image Generation Paper • 2412.14168 • Published Dec 18, 2024 • 16
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published Dec 18, 2024 • 14
Arbitrary-steps Image Super-resolution via Diffusion Inversion Paper • 2412.09013 • Published Dec 12, 2024 • 13
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published Dec 12, 2024 • 24
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published Dec 10, 2024 • 28
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 45
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 60
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models Paper • 2411.18350 • Published Nov 27, 2024 • 26
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published Nov 27, 2024 • 52
Pathways on the Image Manifold: Image Editing via Video Generation Paper • 2411.16819 • Published Nov 25, 2024 • 33
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9