TencentARC
/

ToonComposer

@@ -1,11 +1,11 @@
 ---
-license: mit
-language:
-- en
 base_model:
 - Wan-AI/Wan2.1-I2V-14B-480P
-pipeline_tag: image-to-video
 library_name: diffusers
 ---
 <div align="center">
@@ -28,6 +28,87 @@ Traditional cartoon/anime production is time-consuming, requiring skilled artist
 ToonComposer streamlines this with generative AI, turning hours of manual work of inbetweening and colorization into a single, seamless process. Visit our [project page](https://lg-li.github.io/project/tooncomposer) and read our [paper](https://arxiv.org/abs/2508.10881) for more details.
 This HF model repo provides model weights of ToonComposer. Codes are available at [our GitHub repo](https://github.com/TencentARC/ToonComposer).
 ## Citation
 If you find ToonComposer useful, please consider citing:
@@ -39,5 +120,4 @@ If you find ToonComposer useful, please consider citing:
   journal={arXiv preprint arXiv:2508.10881},
   year={2025}
 }
-```

 ---
 base_model:
 - Wan-AI/Wan2.1-I2V-14B-480P
+language:
+- en
 library_name: diffusers
+license: mit
+pipeline_tag: image-to-video
 ---
 <div align="center">
 ToonComposer streamlines this with generative AI, turning hours of manual work of inbetweening and colorization into a single, seamless process. Visit our [project page](https://lg-li.github.io/project/tooncomposer) and read our [paper](https://arxiv.org/abs/2508.10881) for more details.
 This HF model repo provides model weights of ToonComposer. Codes are available at [our GitHub repo](https://github.com/TencentARC/ToonComposer).
+## Sample Usage
+You can use the `ToonComposer` model with the `diffusers` library. Ensure your environment has the required dependencies, including `flash-attn` as specified in the [official GitHub repository](https://github.com/TencentARC/ToonComposer) for optimal performance.
+```python
+import torch
+from diffusers import DiffusionPipeline
+from PIL import Image
+import numpy as np # For creating dummy images if paths are not found
+# Load the ToonComposer pipeline from the Hugging Face Hub
+pipeline = DiffusionPipeline.from_pretrained(
+    "TencentARC/ToonComposer",
+    torch_dtype=torch.float16, # Use torch.bfloat16 for newer GPUs or if preferred
+    trust_remote_code=True      # Required to load custom pipeline code
+)
+pipeline.to("cuda") # Move model to GPU for faster inference (can use "cpu" for CPU inference)
+# --- Prepare your input data ---
+# 1. Initial Colored Keyframe (Reference Image)
+# This image sets the base visual style and initial frame.
+# Replace 'path/to/your/initial_colored_keyframe.png' with your actual image file.
+try:
+    initial_colored_keyframe = Image.open("path/to/your/initial_colored_keyframe.png").convert("RGB")
+except FileNotFoundError:
+    print("Warning: Initial colored keyframe image not found. Using a dummy white image for demonstration.")
+    # Create a dummy white image matching target resolution (e.g., 1088x608 from model config)
+    initial_colored_keyframe = Image.fromarray(np.full((608, 1088, 3), 255, dtype=np.uint8))
+# 2. Sketch Keyframe (for motion control)
+# This is typically a black and white line drawing that guides motion at a specific frame.
+# Replace 'path/to/your/sketch_at_frame_X.png' with your actual sketch image path.
+# ToonComposer supports multiple sketches at different time steps. For this example, we use one.
+try:
+    sketch_keyframe = Image.open("path/to/your/sketch_at_frame_X.png").convert("RGB")
+except FileNotFoundError:
+    print("Warning: Sketch keyframe image not found. Using a dummy black image for demonstration.")
+    # Create a dummy black image matching target resolution
+    sketch_keyframe = Image.fromarray(np.full((608, 1088, 3), 0, dtype=np.uint8))
+# Text Prompt: Describe the desired motion or scene
+prompt = "a joyful character bouncing a ball"
+# Video Generation Parameters (adjust as needed)
+# Refer to the model's config.json or official GitHub for recommended values.
+num_frames = 33      # Example number of frames from model's config.json
+height = 608         # Example resolution from model's config.json
+width = 1088         # Example resolution from model's config.json
+guidance_scale = 7.5 # Common value for text-to-image/video diffusion
+# --- Generate the video frames ---
+# The exact arguments for this custom pipeline might vary.
+# We infer a plausible API based on common diffusion model practices and the paper's description.
+# The `image` argument is for the initial colored keyframe, and `sketches` for the list of sketch images.
+video_frames = pipeline(
+    prompt=prompt,
+    image=initial_colored_keyframe,
+    sketches=[(5, sketch_keyframe)], # Example: Apply `sketch_keyframe` at frame index 5
+    num_frames=num_frames,
+    height=height,
+    width=width,
+    guidance_scale=guidance_scale,
+).frames # The output is expected to be a list of PIL Images.
+# --- Save or display the generated video frames ---
+output_dir = "./tooncomposer_output"
+import os
+os.makedirs(output_dir, exist_ok=True)
+for i, frame in enumerate(video_frames):
+    frame.save(f"{output_dir}/frame_{i:04d}.png")
+print(f"Generated {len(video_frames)} frames to '{output_dir}'.")
+# Optional: To compile frames into a GIF (requires `imageio` and `imageio-ffmpeg` to be installed)
+# import imageio
+# try:
+#     imageio.mimsave(f"{output_dir}/output_video.gif", video_frames, fps=10)
+#     print(f"Saved GIF to '{output_dir}/output_video.gif'.")
+# except Exception as e:
+#     print(f"Could not save GIF (ensure imageio and imageio-ffmpeg are installed): {e}")
+```
 ## Citation
 If you find ToonComposer useful, please consider citing:
   journal={arXiv preprint arXiv:2508.10881},
   year={2025}
 }
+```