metadata

pipeline_tag: text-to-image
inference: false
license: other
license_name: stabilityai-ai-community
license_link: LICENSE.md
tags:
  - tensorrt
  - sd3.5-medium
  - text-to-image
  - onnx
extra_gated_prompt: >-
  By clicking "Agree", you agree to the [License
  Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md)
  and acknowledge Stability AI's [Privacy
  Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
  Name: text
  Email: text
  Country: country
  Organization or Affiliation: text
  Receive email updates and promotions on Stability AI products, services, and research?:
    type: select
    options:
      - 'Yes'
      - 'No'
  What do you intend to use the model for?:
    type: select
    options:
      - Research
      - Personal use
      - Creative Professional
      - Startup
      - Enterprise
  I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
language:
  - en

Stable Diffusion 3.5 Medium TensorRT

Introduction

This repository hosts the TensorRT-optimized version of Stable Diffusion 3.5 Medium, developed in collaboration between Stability AI and NVIDIA. This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model.

Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications.

Model Details

Model Description

This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision.

Performance using TensorRT 10.13

Timings for 30 steps at 1024x1024

Accelerator	Precision	CLIP-G	CLIP-L	T5	MMDiT x 30	VAE Decoder	Total
H100	BF16	16.52 ms	6.83 ms	8.46 ms	2358.34 ms	72.58 ms	2496.63 ms

Usage Example

Follow the setup instructions on launching a TensorRT NGC container.

git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd35
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash

Install libraries and requirements

cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12

Generate HuggingFace user access token To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the Stable Diffusion 3.5 Medium page. You will then need to obtain a read access token to HuggingFace Hub and export as shown below. See instructions.

export HF_TOKEN=<your access token>

Perform TensorRT optimized inference:

Stable Diffusion 3.5 Medium in BF16 precision

python3 demo_txt2img_sd35.py \
  "a beautiful photograph of Mt. Fuji during cherry blossom" \
  --version=3.5-medium \
  --bf16 \
  --download-onnx-models \
  --denoising-steps=30 \
  --guidance-scale 3.5 \
  --build-static-batch \
  --use-cuda-graph \
  --hf-token=$HF_TOKEN