File size: 5,487 Bytes
79dc332
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# Image Synthesis

Image synthesis is the base feature of DiffSynth Studio. We can generate images with very high resolution.

### OmniGen

OmniGen is a text-image-to-image model, you can synthesize an image according to several given reference images.

|Reference image 1|Reference image 2|Synthesized image|
|-|-|-|
|![image_man](https://github.com/user-attachments/assets/35d00493-625b-45d1-ad2b-b558ea09fe36)|![image_woman](https://github.com/user-attachments/assets/abebf69b-3563-4b3b-91c2-c48ff74a29ea)|![image_merged](https://github.com/user-attachments/assets/2979d5a9-f355-4bec-a91d-1e824d9fc8f6)|

### Example: FLUX

Example script: [`flux_text_to_image.py`](./flux_text_to_image.py) and [`flux_text_to_image_low_vram.py`](./flux_text_to_image_low_vram.py)(low VRAM).

The original version of FLUX doesn't support classifier-free guidance; however, we believe that this guidance mechanism is an important feature for synthesizing beautiful images. You can enable it using the parameter `cfg_scale`, and the extra guidance scale introduced by FLUX is `embedded_guidance`.

|1024*1024 (original)|1024*1024 (classifier-free guidance)|2048*2048 (highres-fix)|
|-|-|-|
|![image_1024](https://github.com/user-attachments/assets/9cbd1f6f-4ac4-4f8b-bf46-218d812a15a0)|![image_1024_cfg](https://github.com/user-attachments/assets/984561e9-553d-4952-9443-79ce144f379f)|![image_2048_highres](https://github.com/user-attachments/assets/2e92b2f8-c177-454f-84f6-f6f5d3aaeeff)|

### Example: Stable Diffusion

Example script: [`sd_text_to_image.py`](./sd_text_to_image.py)

LoRA Training: [`../train/stable_diffusion/`](../train/stable_diffusion/)

|512*512|1024*1024|2048*2048|4096*4096|
|-|-|-|-|
|![512](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/55f679e9-7445-4605-9315-302e93d11370)|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/6fc84611-8da6-4a1f-8fee-9a34eba3b4a5)|![2048](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/9087a73c-9164-4c58-b2a0-effc694143fb)|![4096](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/edee9e71-fc39-4d1c-9ca9-fa52002c67ac)|

### Example: Stable Diffusion XL

Example script: [`sdxl_text_to_image.py`](./sdxl_text_to_image.py)

LoRA Training: [`../train/stable_diffusion_xl/`](../train/stable_diffusion_xl/)

|1024*1024|2048*2048|
|-|-|
|![1024](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/67687748-e738-438c-aee5-96096f09ac90)|![2048](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/584186bc-9855-4140-878e-99541f9a757f)|

### Example: Stable Diffusion 3

Example script: [`sd3_text_to_image.py`](./sd3_text_to_image.py)

LoRA Training: [`../train/stable_diffusion_3/`](../train/stable_diffusion_3/)

|1024*1024|2048*2048|
|-|-|
|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/4df346db-6f91-420a-b4c1-26e205376098)|![image_2048](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/1386c802-e580-4101-939d-f1596802df9d)|

### Example: Kolors

Example script: [`kolors_text_to_image.py`](./kolors_text_to_image.py)

LoRA Training: [`../train/kolors/`](../train/kolors/)

|1024*1024|2048*2048|
|-|-|
|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/53ef6f41-da11-4701-8665-9f64392607bf)|![image_2048](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/66bb7a75-fe31-44e5-90eb-d3140ee4686d)|

Kolors also support the models trained for SD-XL. For example, ControlNets and LoRAs. See [`kolors_with_sdxl_models.py`](./kolors_with_sdxl_models.py)

LoRA: https://civitai.com/models/73305/zyd232s-ink-style

|Base model|with LoRA (alpha=0.5)|with LoRA (alpha=1.0)|with LoRA (alpha=1.5)|
|-|-|-|-|
|![image_0 0](https://github.com/user-attachments/assets/a222eae3-6e0a-4ea6-b301-99e74e2bc11a)|![image_0 5](https://github.com/user-attachments/assets/e429c501-530c-43f6-a30b-9f97996c91a2)|![image_1 0](https://github.com/user-attachments/assets/0ddeed4b-250d-4b5c-a4fa-2db50f63bf1c)|![image_1 5](https://github.com/user-attachments/assets/db35a89d-6325-4422-921e-14fb6ad66c92)|

ControlNet: https://huggingface.co/xinsir/controlnet-union-sdxl-1.0

|Reference image|Depth image|with ControlNet|with ControlNet|
|-|-|-|-|
|![image_0 0](https://github.com/user-attachments/assets/a222eae3-6e0a-4ea6-b301-99e74e2bc11a)|![controlnet_input](https://github.com/user-attachments/assets/d16b2785-bc1f-4184-b170-ae90f1d704c1)|![image_depth_1](https://github.com/user-attachments/assets/90a94780-7b56-4786-8a25-aae118eda171)|![image_depth_2](https://github.com/user-attachments/assets/05eb1309-9c98-49e7-a8ee-f376ceedf18e)|

### Example: Hunyuan-DiT

Example script: [`hunyuan_dit_text_to_image.py`](./hunyuan_dit_text_to_image.py)

LoRA Training: [`../train/hunyuan_dit/`](../train/hunyuan_dit/)

|1024*1024|2048*2048|
|-|-|
|![image_1024](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/60b022c8-df3f-4541-95ab-bf39f2fa8bb5)|![image_2048](https://github.com/modelscope/DiffSynth-Studio/assets/35051019/87919ea8-d428-4963-8257-da05f3901bbb)|

### Example: Stable Diffusion XL Turbo

Example script: [`sdxl_turbo.py`](./sdxl_turbo.py)

We highly recommend you to use this model in the WebUI.

|"black car"|"red car"|
|-|-|
|![black_car](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/7fbfd803-68d4-44f3-8713-8c925fec47d0)|![black_car_to_red_car](https://github.com/Artiprocher/DiffSynth-Studio/assets/35051019/aaf886e4-c33c-4fd8-98e2-29eef117ba00)|