Files changed (1) hide show
  1. README.md +139 -3
README.md CHANGED
@@ -1,3 +1,139 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ library_name: diffusers
7
+ pipeline_tag: image-to-image
8
+ ---
9
+ <p align="center">
10
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" width="400"/>
11
+ <p>
12
+ <p align="center">
13
+ 💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen/Qwen-Image-Edit">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/models/Qwen/Qwen-Image-Edit">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf">Tech Report</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/blog/qwen-image-edit/">Blog</a> &nbsp&nbsp
14
+ <br>
15
+ 🖥️ <a href="https://huggingface.co/spaces/Qwen/qwen-image-edit">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen-Image/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp| &nbsp&nbsp <a href="https://github.com/QwenLM/Qwen-Image">Github</a>&nbsp&nbsp
16
+ </p>
17
+
18
+ <p align="center">
19
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_homepage.jpg" width="1600"/>
20
+ <p>
21
+
22
+
23
+ # Introduction
24
+ We are excited to introduce Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon the 20B-parameter Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing. Furthermore, Qwen-Image-Edit simultaneously feeds the input image into Qwen2.5-VL (for visual-semantic control) and the VAE Encoder (for visual-appearance control), achieving dual capabilities in both semantic and appearance editing. To experience the latest model, visit [Qwen Chat](https://qwen.ai) and select the "Image Editing" feature.
25
+
26
+ Key Features:
27
+
28
+ * **Dual Semantic/Appearance Editing**: Qwen-Image-Edit supports both low-level visual appearance editing (e.g., adding, removing, or modifying elements while keeping certain regions of the image unchanged) and high-level visual semantic editing (e.g., character creation, object rotation, style transfer—where overall pixel values may change, but semantic consistency is preserved).
29
+ * **Precise Text Editing**: Qwen-Image-Edit supports bilingual (Chinese and English) text editing, allowing direct modification—addition, deletion, or alteration—of text within images while preserving original font, size, and style.
30
+ * **Strong Cross-Benchmark Performance**: Evaluations across multiple public benchmarks show that Qwen-Image-Edit achieves state-of-the-art (SOTA) results in image editing tasks, establishing itself as a powerful foundation model for image generation.
31
+
32
+
33
+
34
+ ## Quick Start
35
+
36
+ Install the latest version of diffusers
37
+ ```
38
+ pip install git+https://github.com/huggingface/diffusers
39
+ ```
40
+
41
+ The following contains a code snippet illustrating how to use the model to generate images based on text prompts:
42
+
43
+ ```python
44
+ import os
45
+ from PIL import Image
46
+ import torch
47
+
48
+ from diffusers import QwenImageEditPipeline
49
+
50
+ pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit")
51
+ print("pipeline loaded")
52
+ pipeline.to(torch.bfloat16)
53
+ pipeline.to("cuda")
54
+ pipeline.set_progress_bar_config(disable=None)
55
+ image = Image.open("./input.png").convert("RGB")
56
+ prompt = "Change the rabbit's color to purple, with a flash light background."
57
+ inputs = {
58
+ "image": image,
59
+ "prompt": prompt,
60
+ "generator": torch.manual_seed(0),
61
+ "true_cfg_scale": 4.0,
62
+ "negative_prompt": " ",
63
+ "num_inference_steps": 50,
64
+ }
65
+
66
+ with torch.inference_mode():
67
+ output = pipeline(**inputs)
68
+ output_image = output.images[0]
69
+ output_image.save("output_image_edit.png")
70
+ print("image saved at", os.path.abspath("output_image_edit.png"))
71
+
72
+ ```
73
+
74
+ ## Showcase
75
+ One of Qwen-Image-Edit’s standout capabilities is dual semantic and appearance editing. Semantic editing refers to modifying an image while preserving its original visual semantics. For instance, let’s start with Qwen’s mascot—Capibara:
76
+ ![Capibara](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片3.JPG#center)
77
+ Although every pixel in the edited image differs from the input (the leftmost image), the character identity of Capibara remains consistent. This semantic editing capability enables effortless creation and modification of original IPs. For example, using a series of prompts, we expanded the set to create a full MBTI meme series:
78
+ ![MBTI meme series](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片4.JPG#center)
79
+ Semantic editing is also highly valuable in portrait generation. Given a person’s photo, Qwen-Image-Edit can alter their pose, clothing, or even facial proportions while preserving their facial structure:
80
+ ![Portrait generation](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片10.JPG#center)
81
+ Another key application of semantic editing is viewpoint transformation. As shown below, Qwen-Image-Edit can not only rotate objects by 90 degrees but even by 180 degrees, revealing the back of an object:
82
+ ![Viewpoint transformation 90 degrees](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片12.JPG#center)
83
+ ![Viewpoint transformation 180 degrees](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片13.JPG#center)
84
+ Another example of semantic editing is style transfer. Given a portrait, Qwen-Image-Edit can easily transform it into various styles such as Studio Ghibli, which is particularly useful for creating avatars or character IDs:
85
+ ![Style transfer](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片1.JPG#center)
86
+ In addition to semantic editing, appearance editing addresses a different class of editing needs. Appearance editing requires certain regions of the image to remain completely unchanged. A common example is addition, deletion, or modification.
87
+ Below, we demonstrate adding a signboard to an image. Notably, Qwen-Image-Edit not only adds the signboard but also generates a corresponding reflection:
88
+ ![Adding a signboard](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片6.JPG#center)
89
+ Here’s another interesting example—removing fine strands of hair:
90
+ ![Removing fine strands of hair](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片7.JPG#center)
91
+ Below shows how to modify the color of text in an image—changing the color of the letter "n" to blue:
92
+ ![Modifying text color](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片8.JPG#center)
93
+ Appearance editing is also crucial in modifying human poses, backgrounds, and clothing, as demonstrated in the following three images:
94
+ ![Modifying human poses](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片9.JPG#center)
95
+ ![Modifying backgrounds](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片11.JPG#center)
96
+ ![Modifying clothing](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片5.JPG#center)
97
+ Additionally, appearance editing can be used for photo colorization, such as transforming old black-and-white photos into color:
98
+ ![Photo colorization](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片2.JPG#center)
99
+ The second hallmark of Qwen-Image-Edit is its accurate text editing, made possible by Qwen-Image’s powerful text rendering capabilities.
100
+ For example, the following two images demonstrate Qwen-Image-Edit’s ability in editing English text:
101
+ ![Editing English text 1](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片15.JPG#center)
102
+ ![Editing English text 2](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片16.JPG#center)
103
+ Qwen-Image-Edit can also edit Chinese posters—modifying both large and small text elements:
104
+ ![Editing Chinese posters](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片17.JPG#center)
105
+ Finally, let’s walk through a concrete example showing how sequential editing can correct errors in a calligraphy artwork originally generated by Qwen-Image:
106
+ ![Calligraphy artwork](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片18.JPG#center)
107
+ This artwork contains several incorrect characters. We can progressively correct them using Qwen-Image-Edit. For instance, we can add bounding boxes directly on the original image and instruct Qwen-Image-Edit to fix the highlighted parts—here, correcting “稽” within the red box and “亭” within the blue box:
108
+ ![Correcting characters](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片19.JPG#center)
109
+ Unfortunately, the character “稽” is uncommon, and the model initially fails to correct it—the lower-right component should be “旨”, not “日”. We can further highlight the incorrect “日” with a red box and prompt Qwen-Image-Edit to fine-tune that region into “旨”:
110
+ ![Fine-tuning character](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片20.JPG#center)
111
+ Amazing, right? Following this iterative approach, we can progressively correct all errors until reaching the final version:
112
+ ![Final version 1](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片21.JPG#center)
113
+ ![Final version 2](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片22.JPG#center)
114
+ ![Final version 3](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片23.JPG#center)
115
+ ![Final version 4](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片24.JPG#center)
116
+ ![Final version 5](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit/幻灯片25.JPG#center)
117
+ Ultimately, we obtain a fully correct calligraphy version of Lantingji Xu (Preface to the Poems Composed at the Orchid Pavilion)!
118
+ In summary, we hope Qwen-Image-Edit will further advance the field of image generation, significantly lower the technical barriers to visual content creation, and inspire even more innovative applications.
119
+
120
+
121
+ ## License Agreement
122
+
123
+ Qwen-Image is licensed under Apache 2.0.
124
+
125
+ ## Citation
126
+
127
+ We kindly encourage citation of our work if you find it useful.
128
+
129
+ ```bibtex
130
+ @misc{wu2025qwenimagetechnicalreport,
131
+ title={Qwen-Image Technical Report},
132
+ author={Chenfei Wu and Jiahao Li and Jingren Zhou and Junyang Lin and Kaiyuan Gao and Kun Yan and Sheng-ming Yin and Shuai Bai and Xiao Xu and Yilei Chen and Yuxiang Chen and Zecheng Tang and Zekai Zhang and Zhengyi Wang and An Yang and Bowen Yu and Chen Cheng and Dayiheng Liu and Deqing Li and Hang Zhang and Hao Meng and Hu Wei and Jingyuan Ni and Kai Chen and Kuan Cao and Liang Peng and Lin Qu and Minggang Wu and Peng Wang and Shuting Yu and Tingkun Wen and Wensen Feng and Xiaoxiao Xu and Yi Wang and Yichang Zhang and Yongqiang Zhu and Yujia Wu and Yuxuan Cai and Zenan Liu},
133
+ year={2025},
134
+ eprint={2508.02324},
135
+ archivePrefix={arXiv},
136
+ primaryClass={cs.CV},
137
+ url={https://arxiv.org/abs/2508.02324},
138
+ }
139
+ ```