DrUM (Draw Your Mind)
DrUM enables personalized text-to-image (T2I) generation by integrating reference prompts into T2I diffusion models. It works with foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX, without requiring additional fine-tuning. DrUM leverages condition-level modeling in the latent space using a transformer-based adapter, and integrates seamlessly with open-source text encoders such as OpenCLIP and Google T5.
This repository provides the necessary components to run DrUM for inference. For the full source code, training scripts, and detailed documentation, please visit our official GitHub repository and read the research paper.
Quickstart
This model is designed for easy use with the diffusers
library as a custom pipeline.
Installation
pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub
Usage
import torch
from diffusers import DiffusionPipeline
from pipeline import DrUM
# Load pipeline and attach DrUM
#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda")
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
drum = DrUM(pipeline)
# Generate personalized images
images = drum(
prompt = "a photograph of an astronaut riding a horse",
ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
weight = [1.0],
alpha = 0.3
)
images[0].save("personalized_image.png")
Supported foundation T2I models
DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:
Architecture | Pipeline | Text encoder | DrUM weight |
---|---|---|---|
Stable Diffusion v1 | runwayml/stable-diffusion-v1-5 , prompthero/openjourney-v4 ,stablediffusionapi/realistic-vision-v51 ,stablediffusionapi/deliberate-v2 ,stablediffusionapi/anything-v5 , WarriorMama777/AbyssOrangeMix2 , ... |
openai/clip-vit-large-patch14 |
L.safetensors |
Stable Diffusion v2 | stabilityai/stable-diffusion-2-1 , ... |
openai/clip-vit-huge-patch14 |
H.safetensors |
Stable Diffusion XL | stabilityai/stable-diffusion-xl-base-1.0 , ... |
openai/clip-vit-large-patch14 ,laion/CLIP-ViT-bigG-14-laion2B-39B-b160k |
L.safetensors ,bigG.safetensors |
Stable Diffusion v3 | stabilityai/stable-diffusion-3.5-large stabilityai/stable-diffusion-3.5-medium , ... |
openai/clip-vit-large-patch14 ,laion/CLIP-ViT-bigG-14-laion2B-39B-b160k ,google/t5-v1_1-xxl |
L.safetensors ,bigG.safetensors ,T5.safetensors |
FLUX | black-forest-labs/FLUX.1-dev , ... |
openai/clip-vit-large-patch14 ,google/t5-v1_1-xxl |
L.safetensors T5.safetensors |
Citation
@inproceedings{kim2025drum,
title={Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
author={Hyungjin Kim, Seokho Ahn, and Young-Duk Seo},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2025}
}
License
This project is licensed under the MIT License.
- Downloads last month
- -
Model tree for Burf/DrUM
Base model
black-forest-labs/FLUX.1-dev