|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- video |
|
- video-generation |
|
- video-to-video |
|
- controlnet |
|
- diffusers |
|
- wan2.2 |
|
--- |
|
# Controlnet for Wan2.2 (tile) |
|
|
|
This repo contains the code for controlnet module for Wan2.2. See <a href="https://github.com/TheDenk/wan2.2-controlnet">Github code</a>. |
|
Same approach as controlnet for [Wan2.1](https://github.com/TheDenk/wan2.1-dilated-controlnet). |
|
|
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/nf0_13795_uaVEOuKodOK.mp4"></video> |
|
|
|
### For ComfyUI |
|
Use the cool [ComfyUI-WanVideoWrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper). |
|
|
|
 |
|
|
|
### Inference examples |
|
#### Simple inference with cli |
|
```bash |
|
python -m inference.cli_demo \ |
|
--video_path "resources/bubble.mp4" \ |
|
--prompt "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color." \ |
|
--controlnet_type "tile" \ |
|
--base_model_path Wan-AI/Wan2.2-TI2V-5B-Diffusers \ |
|
--controlnet_model_path TheDenk/wan2.2-ti2v-5b-controlnet-tile-v1 |
|
``` |
|
#### Minimal code example |
|
```python |
|
import os |
|
os.environ['CUDA_VISIBLE_DEVICES'] = "0" |
|
os.environ["TOKENIZERS_PARALLELISM"] = "false" |
|
|
|
import cv2 |
|
from PIL import Image |
|
import torch |
|
from diffusers.utils import load_video, export_to_video |
|
from diffusers import AutoencoderKLWan, UniPCMultistepScheduler |
|
|
|
from wan_controlnet import WanControlnet |
|
from wan_transformer import CustomWanTransformer3DModel |
|
from wan_t2v_controlnet_pipeline import WanTextToVideoControlnetPipeline |
|
|
|
base_model_path = "Wan-AI/Wan2.2-TI2V-5B-Diffusers" |
|
controlnet_model_path = "TheDenk/wan2.2-ti2v-5b-controlnet-tile-v1" |
|
vae = AutoencoderKLWan.from_pretrained(base_model_path, subfolder="vae", torch_dtype=torch.float32) |
|
transformer = CustomWanTransformer3DModel.from_pretrained(base_model_path, subfolder="transformer", torch_dtype=torch.bfloat16) |
|
controlnet = WanControlnet.from_pretrained(controlnet_model_path, torch_dtype=torch.bfloat16) |
|
pipe = WanTextToVideoControlnetPipeline.from_pretrained( |
|
pretrained_model_name_or_path=base_model_path, |
|
controlnet=controlnet, |
|
transformer=transformer, |
|
vae=vae, |
|
torch_dtype=torch.bfloat16 |
|
) |
|
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0) |
|
pipe.enable_model_cpu_offload() |
|
|
|
img_h = 704 # 704 480 |
|
img_w = 1280 # 1280 832 |
|
num_frames = 121 # 121 81 49 |
|
|
|
def apply_gaussian_blur(image, ksize=5, sigmaX=1.0): |
|
image_np = np.array(image) |
|
if ksize % 2 == 0: |
|
ksize += 1 |
|
blurred_image = cv2.GaussianBlur(image_np, (ksize, ksize), sigmaX=sigmaX) |
|
return Image.fromarray(blurred_image) |
|
|
|
video_path = 'bubble.mp4' |
|
video_frames = load_video(video_path)[:num_frames] |
|
ksize = 5 |
|
downscale_coef =4 |
|
controlnet_frames = [x.resize((img_w // downscale_coef, img_h // downscale_coef)) for x in video_frames] |
|
controlnet_frames = [apply_gaussian_blur(x, ksize=ksize, sigmaX=ksize // 2) for x in controlnet_frames] |
|
controlnet_frames = [x.resize((img_w, img_h)) for x in controlnet_frames] |
|
|
|
prompt = "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color." |
|
negative_prompt = "bad quality, worst quality" |
|
|
|
output = pipe( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
height=img_h, |
|
width=img_w, |
|
num_frames=num_frames, |
|
guidance_scale=5, |
|
generator=torch.Generator(device="cuda").manual_seed(42), |
|
output_type="pil", |
|
|
|
controlnet_frames=controlnet_frames, |
|
controlnet_guidance_start=0.0, |
|
controlnet_guidance_end=0.8, |
|
controlnet_weight=0.8, |
|
|
|
teacache_treshold=0.6, |
|
).frames[0] |
|
|
|
export_to_video(output, "output.mp4", fps=16) |
|
``` |
|
## Acknowledgements |
|
Original code and models [Wan2.2](https://github.com/Wan-Video/Wan2.2). |
|
|
|
|
|
## Citations |
|
``` |
|
@misc{TheDenk, |
|
title={Wam2.2 Controlnet}, |
|
author={Karachev Denis}, |
|
url={https://github.com/TheDenk/wan2.2-controlnet}, |
|
publisher={Github}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
## Contacts |
|
<p>Issues should be raised directly in the repository. For professional support and recommendations please <a>[email protected]</a>.</p> |
|
|