tencent
/

HunyuanImage-3.0

@@ -99,7 +99,7 @@ If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.
 * 🏆 **The Largest Image Generation MoE Model:** This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.
-* 🎨 **Superior Image Generation Performance:**Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.
 * 💭 **Intelligent World-Knowledge Reasoning:** The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.
@@ -152,13 +152,23 @@ pip install flashinfer-python
 ### 🔥 Quick Start with Transformers
-The easiest way to get started with HunyuanImage-3.0:
 ```python
 from transformers import AutoModelForCausalLM
 # Load the model
-model_id = "tencent/HunyuanImage-3.0"
 kwargs = dict(
     attn_implementation="sdpa",     # Use "flash_attention_2" if FlashAttention is installed

 * 🏆 **The Largest Image Generation MoE Model:** This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.
+* 🎨 **Superior Image Generation Performance:** Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.
 * 💭 **Intelligent World-Knowledge Reasoning:** The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.
 ### 🔥 Quick Start with Transformers
+#### 1️⃣ Download model weights
+```bash
+# Download from HuggingFace and rename the directory.
+# Notice that the directory name should not contain dots, which may cause issues when loading using Transformers.
+hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
+```
+#### 2️⃣ Run with Transformers
 ```python
 from transformers import AutoModelForCausalLM
 # Load the model
+model_id = "./HunyuanImage-3"
+# Currently we can not load the model using HF model_id `tencent/HunyuanImage-3.0` directly
+# due to the dot in the name.
 kwargs = dict(
     attn_implementation="sdpa",     # Use "flash_attention_2" if FlashAttention is installed