|
--- |
|
pipeline_tag: text-to-image |
|
inference: false |
|
license: other |
|
license_name: stabilityai-ai-community |
|
license_link: LICENSE.md |
|
tags: |
|
- tensorrt |
|
- sd3.5-large |
|
- text-to-image |
|
- onnx |
|
- model-optimizer |
|
- fp8 |
|
- quantization |
|
extra_gated_prompt: >- |
|
By clicking "Agree", you agree to the [License |
|
Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) |
|
and acknowledge Stability AI's [Privacy |
|
Policy](https://stability.ai/privacy-policy). |
|
extra_gated_fields: |
|
Name: text |
|
Email: text |
|
Country: country |
|
Organization or Affiliation: text |
|
Receive email updates and promotions on Stability AI products, services, and research?: |
|
type: select |
|
options: |
|
- 'Yes' |
|
- 'No' |
|
What do you intend to use the model for?: |
|
type: select |
|
options: |
|
- Research |
|
- Personal use |
|
- Creative Professional |
|
- Startup |
|
- Enterprise |
|
I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox |
|
language: |
|
- en |
|
--- |
|
|
|
# Stable Diffusion 3.5 Large TensorRT |
|
## Introduction |
|
|
|
This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Large**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model. |
|
|
|
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
This repository holds the ONNX exports of the T5, MMDiT and VAE models in BF16 precision. It also holds the MMDiT model in FP8 precision. The transformer model was quantized to FP8 precision using [NVIDIA/TensorRT-Model-Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer). |
|
|
|
|
|
## Performance using TensorRT 10.13 |
|
#### Timings for 30 steps at 1024x1024 |
|
|
|
| Accelerator | Precision | CLIP-G | CLIP-L | T5 | MMDiT x 30 | VAE Decoder | Total | |
|
|-------------|-----------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
|
| H100 | BF16 | 13.83 ms | 5.66 ms | 8.55 ms | 7945 ms | 97.17 ms | 8101.83 ms | |
|
| H100 | FP8 | 11.69 ms | 4.83 ms | 9.89 ms | 7294.81 ms | 47.43 ms | 7399.05 ms | |
|
|
|
|
|
## Usage Example |
|
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
|
```shell |
|
git clone https://github.com/NVIDIA/TensorRT.git |
|
cd TensorRT |
|
git checkout release/sd35 |
|
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash |
|
``` |
|
|
|
|
|
2. Install libraries and requirements |
|
```shell |
|
cd demo/Diffusion |
|
python3 -m pip install --upgrade pip |
|
pip3 install -r requirements.txt |
|
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 |
|
``` |
|
|
|
3. Generate HuggingFace user access token |
|
To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the [Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) page. |
|
You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens). |
|
|
|
```bash |
|
export HF_TOKEN=<your access token> |
|
``` |
|
|
|
4. Perform TensorRT optimized inference: |
|
|
|
- **Stable Diffusion 3.5 Large in BF16 precision** |
|
|
|
``` |
|
python3 demo_txt2img_sd35.py \ |
|
"A chic urban apartment interior highlighting mid-century modern furniture, vibrant abstract art pieces on clean white walls, and large windows providing a stunning view of the bustling city below." \ |
|
--version=3.5-large \ |
|
--bf16 \ |
|
--download-onnx-models \ |
|
--denoising-steps=30 \ |
|
--guidance-scale 3.5 \ |
|
--build-static-batch \ |
|
--use-cuda-graph \ |
|
--hf-token=$HF_TOKEN |
|
``` |
|
|
|
- **Stable Diffusion 3.5 Large using FP8 quantization** |
|
|
|
``` |
|
python3 demo_txt2img_sd35.py \ |
|
"A chic urban apartment interior highlighting mid-century modern furniture, vibrant abstract art pieces on clean white walls, and large windows providing a stunning view of the bustling city below." \ |
|
--version=3.5-large \ |
|
--fp8 \ |
|
--denoising-steps=30 \ |
|
--guidance-scale 3.5 \ |
|
--download-onnx-models \ |
|
--build-static-batch \ |
|
--use-cuda-graph \ |
|
--hf-token=$HF_TOKEN \ |
|
--onnx-dir onnx_fp8 \ |
|
--engine-dir engine_fp8 |
|
``` |
|
|