README.md · TencentARC/ToonComposer at refs/pr/1

metadata

base_model:
  - Wan-AI/Wan2.1-I2V-14B-480P
language:
  - en
library_name: diffusers
license: mit
pipeline_tag: image-to-video

ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing

Traditional cartoon/anime production is time-consuming, requiring skilled artists for keyframing, inbetweening, and colorization. ToonComposer streamlines this with generative AI, turning hours of manual work of inbetweening and colorization into a single, seamless process. Visit our project page and read our paper for more details. This HF model repo provides model weights of ToonComposer. Codes are available at our GitHub repo.

Sample Usage

You can use the ToonComposer model with the diffusers library. Ensure your environment has the required dependencies, including flash-attn as specified in the official GitHub repository for optimal performance.

import torch
from diffusers import DiffusionPipeline
from PIL import Image
import numpy as np # For creating dummy images if paths are not found

# Load the ToonComposer pipeline from the Hugging Face Hub
pipeline = DiffusionPipeline.from_pretrained(
    "TencentARC/ToonComposer",
    torch_dtype=torch.float16, # Use torch.bfloat16 for newer GPUs or if preferred
    trust_remote_code=True      # Required to load custom pipeline code
)
pipeline.to("cuda") # Move model to GPU for faster inference (can use "cpu" for CPU inference)

# --- Prepare your input data ---
# 1. Initial Colored Keyframe (Reference Image)
# This image sets the base visual style and initial frame.
# Replace 'path/to/your/initial_colored_keyframe.png' with your actual image file.
try:
    initial_colored_keyframe = Image.open("path/to/your/initial_colored_keyframe.png").convert("RGB")
except FileNotFoundError:
    print("Warning: Initial colored keyframe image not found. Using a dummy white image for demonstration.")
    # Create a dummy white image matching target resolution (e.g., 1088x608 from model config)
    initial_colored_keyframe = Image.fromarray(np.full((608, 1088, 3), 255, dtype=np.uint8))

# 2. Sketch Keyframe (for motion control)
# This is typically a black and white line drawing that guides motion at a specific frame.
# Replace 'path/to/your/sketch_at_frame_X.png' with your actual sketch image path.
# ToonComposer supports multiple sketches at different time steps. For this example, we use one.
try:
    sketch_keyframe = Image.open("path/to/your/sketch_at_frame_X.png").convert("RGB")
except FileNotFoundError:
    print("Warning: Sketch keyframe image not found. Using a dummy black image for demonstration.")
    # Create a dummy black image matching target resolution
    sketch_keyframe = Image.fromarray(np.full((608, 1088, 3), 0, dtype=np.uint8))

# Text Prompt: Describe the desired motion or scene
prompt = "a joyful character bouncing a ball"

# Video Generation Parameters (adjust as needed)
# Refer to the model's config.json or official GitHub for recommended values.
num_frames = 33      # Example number of frames from model's config.json
height = 608         # Example resolution from model's config.json
width = 1088         # Example resolution from model's config.json
guidance_scale = 7.5 # Common value for text-to-image/video diffusion

# --- Generate the video frames ---
# The exact arguments for this custom pipeline might vary.
# We infer a plausible API based on common diffusion model practices and the paper's description.
# The `image` argument is for the initial colored keyframe, and `sketches` for the list of sketch images.
video_frames = pipeline(
    prompt=prompt,
    image=initial_colored_keyframe,
    sketches=[(5, sketch_keyframe)], # Example: Apply `sketch_keyframe` at frame index 5
    num_frames=num_frames,
    height=height,
    width=width,
    guidance_scale=guidance_scale,
).frames # The output is expected to be a list of PIL Images.

# --- Save or display the generated video frames ---
output_dir = "./tooncomposer_output"
import os
os.makedirs(output_dir, exist_ok=True)
for i, frame in enumerate(video_frames):
    frame.save(f"{output_dir}/frame_{i:04d}.png")
print(f"Generated {len(video_frames)} frames to '{output_dir}'.")

# Optional: To compile frames into a GIF (requires `imageio` and `imageio-ffmpeg` to be installed)
# import imageio
# try:
#     imageio.mimsave(f"{output_dir}/output_video.gif", video_frames, fps=10)
#     print(f"Saved GIF to '{output_dir}/output_video.gif'.")
# except Exception as e:
#     print(f"Could not save GIF (ensure imageio and imageio-ffmpeg are installed): {e}")

Citation

If you find ToonComposer useful, please consider citing:

@article{li2025tooncomposer,
  title={ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing},
  author={Li, Lingen and Wang, Guangzhi and Zhang, Zhaoyang and Li, Yaowei and Li, Xiaoyu and Dou, Qi and Gu, Jinwei and Xue, Tianfan and Shan, Ying},
  journal={arXiv preprint arXiv:2508.10881},
  year={2025}
}