Malav Warke

malavwarke

https://www.creatosaurus.io/

AI & ML interests

https://www.creatosaurus.io/

Recent Activity

upvoted a paper 2 days ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

upvoted a paper 10 days ago

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

upvoted a paper 12 days ago

On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

View all activity

Organizations

malavwarke's activity

upvoted a paper 2 days ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published 5 days ago • 34

upvoted a paper 10 days ago

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

Paper • 2502.07870 • Published 12 days ago • 42

upvoted a paper 12 days ago

On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

Paper • 2502.04363 • Published 18 days ago • 11

upvoted an article 19 days ago

Article

Open-source DeepResearch – Freeing our search agents

20 days ago

• 1.08k

upvoted a paper 28 days ago

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper • 2501.12909 • Published Jan 22 • 68

upvoted 2 articles about 1 month ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 142

Article

The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...

•

Jan 20

• 61

upvoted a paper about 2 months ago

Edicho: Consistent Image Editing in the Wild

Paper • 2412.21079 • Published Dec 30, 2024 • 23

upvoted 7 papers 2 months ago

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Paper • 2412.09626 • Published Dec 12, 2024 • 20

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published Dec 12, 2024 • 24

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Paper • 2412.07774 • Published Dec 10, 2024 • 28

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 45

upvoted 5 papers 3 months ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published Dec 5, 2024 • 60

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Paper • 2411.18350 • Published Nov 27, 2024 • 26

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Paper • 2411.18613 • Published Nov 27, 2024 • 52

Pathways on the Image Manifold: Image Editing via Video Generation

Paper • 2411.16819 • Published Nov 25, 2024 • 33

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement

Paper • 2411.15115 • Published Nov 22, 2024 • 9