EchoShot: Multi-Shot Portrait Video Generation

Jiahao Wang1 · Hualian Sheng2 · Sijia Cai2,† · Weizhan Zhang1,*
Caixia Yan1 · Yachuang Feng2 . Bing Deng2 . Jieping Ye2

1Xi'an Jiaotong University      2Alibaba Cloud

Paper PDF Project Page Github Page

📝 Intro

This is the official model of EchoShot, which allows users to generate multiple video shots showing the same person, controlled by customized prompts. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!

🔔 News

  • July 15, 2025: 🔥 EchoShot-1.3B-preview is now available at HuggingFace!
  • July 15, 2025: 🎉 Release code of inference and training codes.
  • May 25, 2025: We propose EchoShot, a multi-shot portrait video generation model.

📖 Citation

If you are inspired by our work, please cite our paper.

@article{wang2025echoshot,
  title={EchoShot: Multi-Shot Portrait Video Generation},
  author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
  journal={arXiv preprint arXiv:2506.15838},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 3 Ask for provider support

Model tree for JonneyWang/EchoShot

Finetuned
(13)
this model