wan2.2-ti2v-5b-controlnet-tile-v1 / README.md

Update README.md

236acad verified 8 days ago

4.91 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- video
	- video-generation
	- video-to-video
	- controlnet
	- diffusers
	- wan2.2
	---
	# Controlnet for Wan2.2 (tile)

	This repo contains the code for controlnet module for Wan2.2. See <a href="https://github.com/TheDenk/wan2.2-controlnet">Github code</a>.
	Same approach as controlnet for [Wan2.1](https://github.com/TheDenk/wan2.1-dilated-controlnet).

	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/nf0_13795_uaVEOuKodOK.mp4"></video>

	### For ComfyUI
	Use the cool [ComfyUI-WanVideoWrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper).

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/63fde49f6315a264aba6a7ed/xMkt_cPrf4aWipPKxfVIy.png)

	### Inference examples
	#### Simple inference with cli
	```bash
	python -m inference.cli_demo \
	--video_path "resources/bubble.mp4" \
	--prompt "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color." \
	--controlnet_type "tile" \
	--base_model_path Wan-AI/Wan2.2-TI2V-5B-Diffusers \
	--controlnet_model_path TheDenk/wan2.2-ti2v-5b-controlnet-tile-v1
	```
	#### Minimal code example
	```python
	import os
	os.environ['CUDA_VISIBLE_DEVICES'] = "0"
	os.environ["TOKENIZERS_PARALLELISM"] = "false"

	import cv2
	from PIL import Image
	import torch
	from diffusers.utils import load_video, export_to_video
	from diffusers import AutoencoderKLWan, UniPCMultistepScheduler

	from wan_controlnet import WanControlnet
	from wan_transformer import CustomWanTransformer3DModel
	from wan_t2v_controlnet_pipeline import WanTextToVideoControlnetPipeline

	base_model_path = "Wan-AI/Wan2.2-TI2V-5B-Diffusers"
	controlnet_model_path = "TheDenk/wan2.2-ti2v-5b-controlnet-tile-v1"
	vae = AutoencoderKLWan.from_pretrained(base_model_path, subfolder="vae", torch_dtype=torch.float32)
	transformer = CustomWanTransformer3DModel.from_pretrained(base_model_path, subfolder="transformer", torch_dtype=torch.bfloat16)
	controlnet = WanControlnet.from_pretrained(controlnet_model_path, torch_dtype=torch.bfloat16)
	pipe = WanTextToVideoControlnetPipeline.from_pretrained(
	pretrained_model_name_or_path=base_model_path,
	controlnet=controlnet,
	transformer=transformer,
	vae=vae,
	torch_dtype=torch.bfloat16
	)
	pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=5.0)
	pipe.enable_model_cpu_offload()

	img_h = 704 # 704 480
	img_w = 1280 # 1280 832
	num_frames = 121 # 121 81 49

	def apply_gaussian_blur(image, ksize=5, sigmaX=1.0):
	image_np = np.array(image)
	if ksize % 2 == 0:
	ksize += 1
	blurred_image = cv2.GaussianBlur(image_np, (ksize, ksize), sigmaX=sigmaX)
	return Image.fromarray(blurred_image)

	video_path = 'bubble.mp4'
	video_frames = load_video(video_path)[:num_frames]
	ksize = 5
	downscale_coef =4
	controlnet_frames = [x.resize((img_w // downscale_coef, img_h // downscale_coef)) for x in video_frames]
	controlnet_frames = [apply_gaussian_blur(x, ksize=ksize, sigmaX=ksize // 2) for x in controlnet_frames]
	controlnet_frames = [x.resize((img_w, img_h)) for x in controlnet_frames]

	prompt = "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color."
	negative_prompt = "bad quality, worst quality"

	output = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
	height=img_h,
	width=img_w,
	num_frames=num_frames,
	guidance_scale=5,
	generator=torch.Generator(device="cuda").manual_seed(42),
	output_type="pil",

	controlnet_frames=controlnet_frames,
	controlnet_guidance_start=0.0,
	controlnet_guidance_end=0.8,
	controlnet_weight=0.8,

	teacache_treshold=0.6,
	).frames[0]

	export_to_video(output, "output.mp4", fps=16)
	```
	## Acknowledgements
	Original code and models [Wan2.2](https://github.com/Wan-Video/Wan2.2).


	## Citations
	```
	@misc{TheDenk,
	title={Wam2.2 Controlnet},
	author={Karachev Denis},
	url={https://github.com/TheDenk/wan2.2-controlnet},
	publisher={Github},
	year={2025}
	}
	```

	## Contacts
	<p>Issues should be raised directly in the repository. For professional support and recommendations please <a>[email protected]</a>.</p>