DrUM (Draw Your Mind)

DrUM enables personalized text-to-image (T2I) generation by integrating reference prompts into T2I diffusion models. It works with foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX, without requiring additional fine-tuning. DrUM leverages condition-level modeling in the latent space using a transformer-based adapter, and integrates seamlessly with open-source text encoders such as OpenCLIP and Google T5.

This repository provides the necessary components to run DrUM for inference. For the full source code, training scripts, and detailed documentation, please visit our official GitHub repository and read the research paper.

Quickstart

This model is designed for easy use with the diffusers library as a custom pipeline.

Installation

pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub

Usage

import torch

from diffusers import DiffusionPipeline
from pipeline import DrUM

# Load pipeline and attach DrUM
#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda")
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
drum = DrUM(pipeline)

# Generate personalized images
images = drum(
    prompt = "a photograph of an astronaut riding a horse",
    ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
    weight = [1.0],
    alpha = 0.3
)

images[0].save("personalized_image.png")

Supported foundation T2I models

DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:

Architecture	Pipeline	Text encoder	DrUM weight
Stable Diffusion v1	`runwayml/stable-diffusion-v1-5`, `prompthero/openjourney-v4`, `stablediffusionapi/realistic-vision-v51`,`stablediffusionapi/deliberate-v2`, `stablediffusionapi/anything-v5`, `WarriorMama777/AbyssOrangeMix2`, ...	`openai/clip-vit-large-patch14`	`L.safetensors`
Stable Diffusion v2	`stabilityai/stable-diffusion-2-1`, ...	`openai/clip-vit-huge-patch14`	`H.safetensors`
Stable Diffusion XL	`stabilityai/stable-diffusion-xl-base-1.0`, ...	`openai/clip-vit-large-patch14`, `laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`	`L.safetensors`, `bigG.safetensors`
Stable Diffusion v3	`stabilityai/stable-diffusion-3.5-large` `stabilityai/stable-diffusion-3.5-medium`, ...	`openai/clip-vit-large-patch14`, `laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`, `google/t5-v1_1-xxl`	`L.safetensors`, `bigG.safetensors`, `T5.safetensors`
FLUX	`black-forest-labs/FLUX.1-dev`, ...	`openai/clip-vit-large-patch14`, `google/t5-v1_1-xxl`	`L.safetensors` `T5.safetensors`

Citation

@inproceedings{kim2025drum,
    title={Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
    author={Hyungjin Kim, Seokho Ahn, and Young-Duk Seo},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year={2025}
}

License

This project is licensed under the MIT License.

Burf
/

DrUM