|
--- |
|
base_model: |
|
- Wan-AI/Wan2.1-I2V-14B-480P |
|
language: |
|
- en |
|
library_name: diffusers |
|
license: mit |
|
pipeline_tag: image-to-video |
|
--- |
|
|
|
<div align="center"> |
|
<img src='https://github.com/TencentARC/ToonComposer/raw/main/samples/ToonComposer-Icon.png' width='120px'> |
|
</div> |
|
|
|
<p align="center"> <b> ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing </b> </p> |
|
|
|
<p align="center"> <a href='https://lg-li.github.io/project/tooncomposer'><img src='https://img.shields.io/badge/Project-Page-Green'></a> |
|
<a href='https://huggingface.co/TencentARC/ToonComposer'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> |
|
<a href="https://arxiv.org/abs/2508.10881"><img src="https://img.shields.io/static/v1?label=ArxivPreprint&message=ToonComposer&color=red&logo=arxiv"></a> |
|
<a href="https://github.com/TencentARC/ToonComposer"><img src="https://img.shields.io/static/v1?label=Github&message=ToonComposer&color=black&logo=github"></a> |
|
</p> |
|
|
|
<div align="center"> |
|
<img src='https://github.com/TencentARC/ToonComposer/raw/main/samples/ToonComposer-TLDR.jpg' width='550px'> |
|
</div> |
|
|
|
Traditional cartoon/anime production is time-consuming, requiring skilled artists for keyframing, inbetweening, and colorization. |
|
ToonComposer streamlines this with generative AI, turning hours of manual work of inbetweening and colorization into a single, seamless process. Visit our [project page](https://lg-li.github.io/project/tooncomposer) and read our [paper](https://arxiv.org/abs/2508.10881) for more details. |
|
This HF model repo provides model weights of ToonComposer. Codes are available at [our GitHub repo](https://github.com/TencentARC/ToonComposer). |
|
|
|
## Sample Usage |
|
|
|
You can use the `ToonComposer` model with the `diffusers` library. Ensure your environment has the required dependencies, including `flash-attn` as specified in the [official GitHub repository](https://github.com/TencentARC/ToonComposer) for optimal performance. |
|
|
|
```python |
|
import torch |
|
from diffusers import DiffusionPipeline |
|
from PIL import Image |
|
import numpy as np # For creating dummy images if paths are not found |
|
|
|
# Load the ToonComposer pipeline from the Hugging Face Hub |
|
pipeline = DiffusionPipeline.from_pretrained( |
|
"TencentARC/ToonComposer", |
|
torch_dtype=torch.float16, # Use torch.bfloat16 for newer GPUs or if preferred |
|
trust_remote_code=True # Required to load custom pipeline code |
|
) |
|
pipeline.to("cuda") # Move model to GPU for faster inference (can use "cpu" for CPU inference) |
|
|
|
# --- Prepare your input data --- |
|
# 1. Initial Colored Keyframe (Reference Image) |
|
# This image sets the base visual style and initial frame. |
|
# Replace 'path/to/your/initial_colored_keyframe.png' with your actual image file. |
|
try: |
|
initial_colored_keyframe = Image.open("path/to/your/initial_colored_keyframe.png").convert("RGB") |
|
except FileNotFoundError: |
|
print("Warning: Initial colored keyframe image not found. Using a dummy white image for demonstration.") |
|
# Create a dummy white image matching target resolution (e.g., 1088x608 from model config) |
|
initial_colored_keyframe = Image.fromarray(np.full((608, 1088, 3), 255, dtype=np.uint8)) |
|
|
|
# 2. Sketch Keyframe (for motion control) |
|
# This is typically a black and white line drawing that guides motion at a specific frame. |
|
# Replace 'path/to/your/sketch_at_frame_X.png' with your actual sketch image path. |
|
# ToonComposer supports multiple sketches at different time steps. For this example, we use one. |
|
try: |
|
sketch_keyframe = Image.open("path/to/your/sketch_at_frame_X.png").convert("RGB") |
|
except FileNotFoundError: |
|
print("Warning: Sketch keyframe image not found. Using a dummy black image for demonstration.") |
|
# Create a dummy black image matching target resolution |
|
sketch_keyframe = Image.fromarray(np.full((608, 1088, 3), 0, dtype=np.uint8)) |
|
|
|
# Text Prompt: Describe the desired motion or scene |
|
prompt = "a joyful character bouncing a ball" |
|
|
|
# Video Generation Parameters (adjust as needed) |
|
# Refer to the model's config.json or official GitHub for recommended values. |
|
num_frames = 33 # Example number of frames from model's config.json |
|
height = 608 # Example resolution from model's config.json |
|
width = 1088 # Example resolution from model's config.json |
|
guidance_scale = 7.5 # Common value for text-to-image/video diffusion |
|
|
|
# --- Generate the video frames --- |
|
# The exact arguments for this custom pipeline might vary. |
|
# We infer a plausible API based on common diffusion model practices and the paper's description. |
|
# The `image` argument is for the initial colored keyframe, and `sketches` for the list of sketch images. |
|
video_frames = pipeline( |
|
prompt=prompt, |
|
image=initial_colored_keyframe, |
|
sketches=[(5, sketch_keyframe)], # Example: Apply `sketch_keyframe` at frame index 5 |
|
num_frames=num_frames, |
|
height=height, |
|
width=width, |
|
guidance_scale=guidance_scale, |
|
).frames # The output is expected to be a list of PIL Images. |
|
|
|
# --- Save or display the generated video frames --- |
|
output_dir = "./tooncomposer_output" |
|
import os |
|
os.makedirs(output_dir, exist_ok=True) |
|
for i, frame in enumerate(video_frames): |
|
frame.save(f"{output_dir}/frame_{i:04d}.png") |
|
print(f"Generated {len(video_frames)} frames to '{output_dir}'.") |
|
|
|
# Optional: To compile frames into a GIF (requires `imageio` and `imageio-ffmpeg` to be installed) |
|
# import imageio |
|
# try: |
|
# imageio.mimsave(f"{output_dir}/output_video.gif", video_frames, fps=10) |
|
# print(f"Saved GIF to '{output_dir}/output_video.gif'.") |
|
# except Exception as e: |
|
# print(f"Could not save GIF (ensure imageio and imageio-ffmpeg are installed): {e}") |
|
``` |
|
|
|
## Citation |
|
|
|
If you find ToonComposer useful, please consider citing: |
|
|
|
``` |
|
@article{li2025tooncomposer, |
|
title={ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing}, |
|
author={Li, Lingen and Wang, Guangzhi and Zhang, Zhaoyang and Li, Yaowei and Li, Xiaoyu and Dou, Qi and Gu, Jinwei and Xue, Tianfan and Shan, Ying}, |
|
journal={arXiv preprint arXiv:2508.10881}, |
|
year={2025} |
|
} |
|
``` |