DrUM (Draw Your Mind)

DrUM enables personalized text-to-image (T2I) generation by integrating reference prompts into T2I diffusion models. It works with foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX, without requiring additional fine-tuning. DrUM leverages condition-level modeling in the latent space using a transformer-based adapter, and integrates seamlessly with open-source text encoders such as OpenCLIP and Google T5.

This repository provides the necessary components to run DrUM for inference. For the full source code, training scripts, and detailed documentation, please visit our official GitHub repository and read the research paper.

Quickstart

This model is designed for easy use with the diffusers library as a custom pipeline.

Installation

pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub

Usage

import torch

from diffusers import DiffusionPipeline
from pipeline import DrUM

# Load pipeline and attach DrUM
#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda")
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
drum = DrUM(pipeline)

# Generate personalized images
images = drum(
    prompt = "a photograph of an astronaut riding a horse",
    ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
    weight = [1.0],
    alpha = 0.3
)

images[0].save("personalized_image.png")

Supported foundation T2I models

DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:

Architecture Pipeline Text encoder DrUM weight
Stable Diffusion v1 runwayml/stable-diffusion-v1-5, prompthero/openjourney-v4,
stablediffusionapi/realistic-vision-v51,stablediffusionapi/deliberate-v2,
stablediffusionapi/anything-v5, WarriorMama777/AbyssOrangeMix2, ...
openai/clip-vit-large-patch14 L.safetensors
Stable Diffusion v2 stabilityai/stable-diffusion-2-1, ... openai/clip-vit-huge-patch14 H.safetensors
Stable Diffusion XL stabilityai/stable-diffusion-xl-base-1.0, ... openai/clip-vit-large-patch14,
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
L.safetensors,
bigG.safetensors
Stable Diffusion v3 stabilityai/stable-diffusion-3.5-large
stabilityai/stable-diffusion-3.5-medium, ...
openai/clip-vit-large-patch14,
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k,
google/t5-v1_1-xxl
L.safetensors,
bigG.safetensors,
T5.safetensors
FLUX black-forest-labs/FLUX.1-dev, ... openai/clip-vit-large-patch14,
google/t5-v1_1-xxl
L.safetensors
T5.safetensors

Citation

@inproceedings{kim2025drum,
    title={Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
    author={Hyungjin Kim, Seokho Ahn, and Young-Duk Seo},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    year={2025}
}

License

This project is licensed under the MIT License.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Burf/DrUM

Finetuned
(481)
this model