MCM-Simplified

Prompt
Pancakes with chocolate syrup nuts and bananas stack of whole flapjack tasty breakfast

Model description


license: apache-2.0 tags: - text-to-video - motion-consistency - distillation

Motion Consistency Model - Simplified Implementation

This model is a distilled version of the Motion Consistency Model, trained on a subset of WebVid 2M with additional filtered image-caption pairs from the LAION aesthetic dataset.

Sample Generated Videos

Caption Teacher (ModelScope) - 50 DDIM Steps Student (First Setup) - 4 Steps Student (Second Setup) - 4 Steps
Worker slicing a piece of meat. Image Image Image
Pancakes with chocolate syrup, nuts, and bananas. Image Image Image

Training Details

  • Dataset: 3022 video-caption pairs from WebVid 2M
  • Image Pairs:
    • Setup 1: 20K filtered LAION aesthetic images (min. resolution 450×450)
    • Setup 2: 7.5K filtered LAION aesthetic images (min. resolution 1024×1024)

Training Configurations

Setup 1

  • LR: 5e-6, Grad Accum: 4, Max Grad Norm: 10
  • Discriminator LR: 5e-5, Weight: 1, Lambda R1: 1e-5
  • EMA Decay: 0.95, Epochs: 7, Steps: ~5100

Setup 2 (Modified)

  • LR: 2e-6, Grad Accum: 16, Max Grad Norm: 5
  • Discriminator LR: 1e-6, Weight: 0.5, Lambda R1: 1e-4
  • EMA Decay: 0.98, LR Warmup: 300 steps, Epochs: 10

Evaluation

Frechet Video Distance (FVD)

Model 1 Step 2 Steps 4 Steps 8 Steps
Teacher (50 DDIM Steps) 2954.77 - - -
Student - Setup 1 2598.15 2684.24 3082.84 3914.78
Student - Setup 2 2589.01 3053.35 3284.69 3930.07

CLIP Similarity (×100)

Model 1 Step 2 Steps 4 Steps 8 Steps
Teacher (50 DDIM Steps) 27.88 - - -
Student - Setup 1 22.55 25.62 26.86 27.01
Student - Setup 2 20.13 23.41 25.31 24.62

Conclusion

Setup 2 was modified to stabilize training and prevent the discriminator from overpowering the generator. The changes improved FVD scores for 1-step inference, while multi-step performance varied. CLIP similarity improved across multiple inference steps, indicating better text-to-video alignment.

References

Original Implementation: Motion Consistency Model

Download model

Weights for this model are available in Safetensors format.

Download them in the Files & versions tab.

Downloads last month
19
Inference Examples
Examples
This model is not currently available via any of the supported Inference Providers.

Model tree for SepehrNoey/MCM-Simplified

Adapter
(1)
this model