| | --- |
| | license: apache-2.0 |
| | tags: |
| | - diffusion-single-file |
| | - comfyui |
| | - distillation |
| | - lora |
| | - video |
| | - video genration |
| | base_model: |
| | - Wan-AI/Wan2.1-T2V-14B |
| | - Wan-AI/Wan2.1-I2V-14B-480P |
| | - Wan-AI/Wan2.1-I2V-14B-720P |
| | library_name: diffusers |
| | --- |
| | <div align="center"> |
| |
|
| | # 🎬 Wan2.1 Distilled Models |
| |
|
| | ### ⚡ High-Performance Video Generation with 4-Step Inference |
| |
|
| | *Distillation-accelerated versions of Wan2.1 - Dramatically faster while maintaining exceptional quality* |
| |
|
| |  |
| |
|
| | --- |
| |
|
| | [](https://huggingface.co/lightx2v/Wan2.1-Distill-Models) |
| | [](https://github.com/ModelTC/LightX2V) |
| | [](LICENSE) |
| |
|
| | </div> |
| |
|
| | --- |
| |
|
| | ## 🌟 What's Special? |
| |
|
| | <table> |
| | <tr> |
| | <td width="50%"> |
| |
|
| | ### ⚡ Ultra-Fast Generation |
| | - **4-step inference** (vs traditional 50+ steps) |
| | - Up to **2x faster** than ComfyUI |
| | - Real-time video generation capability |
| |
|
| | </td> |
| | <td width="50%"> |
| |
|
| | ### 🎯 Flexible Options |
| | - Multiple resolutions (480P/720P) |
| | - Various precision formats (BF16/FP8/INT8) |
| | - I2V and T2V support |
| |
|
| | </td> |
| | </tr> |
| | <tr> |
| | <td width="50%"> |
| |
|
| | ### 💾 Memory Efficient |
| | - FP8/INT8: **~50% size reduction** |
| | - CPU offload support |
| | - Optimized for consumer GPUs |
| |
|
| | </td> |
| | <td width="50%"> |
| |
|
| | ### 🔧 Easy Integration |
| | - Compatible with LightX2V framework |
| | - ComfyUI support available |
| | - Simple configuration files |
| |
|
| | </td> |
| | </tr> |
| | </table> |
| |
|
| | --- |
| |
|
| | ## 📦 Model Catalog |
| |
|
| | ### 🎥 Model Types |
| |
|
| | <table> |
| | <tr> |
| | <td align="center" width="50%"> |
| |
|
| | #### 🖼️ **Image-to-Video (I2V)** |
| | Transform still images into dynamic videos |
| | - 📺 480P Resolution |
| | - 🎬 720P Resolution |
| |
|
| | </td> |
| | <td align="center" width="50%"> |
| |
|
| | #### 📝 **Text-to-Video (T2V)** |
| | Generate videos from text descriptions |
| | - 🚀 14B Parameters |
| | - 🎨 High-quality synthesis |
| |
|
| | </td> |
| | </tr> |
| | </table> |
| |
|
| | ### 🎯 Precision Variants |
| |
|
| | | Precision | Model Identifier | Model Size | Framework | Quality vs Speed | |
| | |:---------:|:-----------------|:----------:|:---------:|:-----------------| |
| | | 🏆 **BF16** | `lightx2v_4step` | ~28-32 GB | LightX2V | ⭐⭐⭐⭐⭐ Highest quality | |
| | | ⚡ **FP8** | `scaled_fp8_e4m3_lightx2v_4step` | ~15-17 GB | LightX2V | ⭐⭐⭐⭐ Excellent balance | |
| | | 🎯 **INT8** | `int8_lightx2v_4step` | ~15-17 GB | LightX2V | ⭐⭐⭐⭐ Fast & efficient | |
| | | 🔷 **FP8 ComfyUI** | `scaled_fp8_e4m3_lightx2v_4step_comfyui` | ~15-17 GB | ComfyUI | ⭐⭐⭐ ComfyUI ready | |
| |
|
| | ### 📝 Naming Convention |
| |
|
| | ```bash |
| | # Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors |
| | |
| | # Examples: |
| | wan2.1_i2v_720p_lightx2v_4step.safetensors # 720P I2V - BF16 |
| | wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # 720P I2V - FP8 |
| | wan2.1_i2v_480p_int8_lightx2v_4step.safetensors # 480P I2V - INT8 |
| | wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors # T2V - FP8 ComfyUI |
| | ``` |
| |
|
| | > 💡 **Explore all models**: [Browse Full Model Collection →](https://huggingface.co/lightx2v/Wan2.1-Distill-Models/tree/main) |
| |
|
| | ## 🚀 Usage |
| |
|
| | **LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!** |
| |
|
| | #### Quick Start |
| |
|
| | 1. Download model (720P I2V FP8 example) |
| | ```bash |
| | huggingface-cli download lightx2v/Wan2.1-Distill-Models \ |
| | --local-dir ./models/wan2.1_i2v_720p \ |
| | --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors" |
| | ``` |
| |
|
| | 2. Clone LightX2V repository |
| |
|
| | ```bash |
| | git clone https://github.com/ModelTC/LightX2V.git |
| | cd LightX2V |
| | ``` |
| |
|
| | 3. Install dependencies |
| |
|
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| | Or refer to [Quick Start Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) to use docker |
| |
|
| | 4. Select and modify configuration file |
| |
|
| | Choose the appropriate configuration based on your GPU memory: |
| |
|
| | **For 80GB+ GPU (A100/H100)** |
| | - I2V: [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json) |
| | - T2V: [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json) |
| |
|
| | **For 24GB+ GPU (RTX 4090)** |
| | - I2V: [wan_i2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg_4090.json) |
| | - T2V: [wan_t2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg_4090.json) |
| |
|
| |
|
| | 5. Run inference |
| | ``` |
| | cd scripts |
| | bash wan/run_wan_i2v_distill_4step_cfg.sh |
| | ``` |
| |
|
| | #### Documentation |
| | - **Quick Start Guide**: [LightX2V Quick Start](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) |
| | - **Complete Usage Guide**: [LightX2V Model Structure Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) |
| | - **Configuration Guide**: [Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/distill) |
| | - **Quantization Usage**: [Quantization Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/quantization.md) |
| | - **Parameter Offload**: [Offload Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/offload.md) |
| |
|
| |
|
| | #### Performance Advantages |
| |
|
| | - ⚡ **Fast**: Approximately **2x faster** than ComfyUI |
| | - 🎯 **Optimized**: Deeply optimized for distilled models |
| | - 💾 **Memory Efficient**: Supports CPU offload and other memory optimization techniques |
| | - 🛠️ **Flexible**: Supports multiple quantization formats and configuration options |
| |
|
| |
|
| | ### Community |
| | - **Issues**: https://github.com/ModelTC/LightX2V/issues |
| |
|
| | ## ⚠️ Important Notes |
| |
|
| | 1. **Additional Components**: These models only contain DIT weights. You also need: |
| | - T5 text encoder |
| | - CLIP vision encoder |
| | - VAE encoder/decoder |
| | - Tokenizers |
| |
|
| | Refer to [LightX2V Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) for how to organize the complete model directory. |
| |
|
| | If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V) |