Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<h1 align='center'>MoDA: Multi-modal Diffusion Architecture for Talking Head Generation</h1>
|
2 |
+
|
3 |
+
<div align="center">
|
4 |
+
|
5 |
+
<strong>Authors</strong> <br><br>
|
6 |
+
|
7 |
+
Xinyang Li<sup>1,2</sup>,
|
8 |
+
Gen Li<sup>2</sup>,
|
9 |
+
Zhihui Lin<sup>1,3</sup>,
|
10 |
+
Yichen Qian<sup>1,3 †</sup>,
|
11 |
+
Gongxin Yao<sup>2</sup>,
|
12 |
+
Weinan Jia<sup>1</sup>,
|
13 |
+
Aowen Wang<sup>1</sup>,
|
14 |
+
Weihua Chen<sup>1,3</sup>,
|
15 |
+
Fan Wang<sup>1,3</sup> <br><br>
|
16 |
+
|
17 |
+
<sup>1</sup>Xunguang Team, DAMO Academy, Alibaba Group
|
18 |
+
<sup>2</sup>Zhejiang University
|
19 |
+
<sup>3</sup>Hupan Lab <br><br>
|
20 |
+
|
21 |
+
<sup>†</sup>Corresponding authors: [email protected], [email protected]
|
22 |
+
|
23 |
+
</div>
|
24 |
+
<br>
|
25 |
+
<div align='center'>
|
26 |
+
<a href='https://lixinyyang.github.io/MoDA.github.io/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
|
27 |
+
<a href='https://arxiv.org/abs/2507.03256'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
|
28 |
+
</div>
|
29 |
+
|
30 |
+
## 📂 Updates
|
31 |
+
|
32 |
+
* [2025.08.08] 🔥 We release our inference [codes](https://github.com/lixinyyang/MoDA/) and [models](https://huggingface.co/lixinyizju/moda/).
|
33 |
+
|
34 |
+
## ⚙️ Installation
|
35 |
+
|
36 |
+
**Create environment:**
|
37 |
+
|
38 |
+
```bash
|
39 |
+
# 1. Create base environment
|
40 |
+
conda create -n moda python=3.10 -y
|
41 |
+
conda activate moda
|
42 |
+
|
43 |
+
# 2. Install requirements
|
44 |
+
pip install -r requirements.txt
|
45 |
+
|
46 |
+
# 3. Install ffmpeg
|
47 |
+
sudo apt-get update
|
48 |
+
sudo apt-get install ffmpeg -y
|
49 |
+
```
|
50 |
+
## 🚀 Inference
|
51 |
+
```python
|
52 |
+
python src/models/inference/moda_test.py --image_path src/examples/reference_images/6.jpg --audio_path src/examples/driving_audios/5.wav
|
53 |
+
```
|
54 |
+
## ⚖️ Disclaimer
|
55 |
+
This project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.
|
56 |
+
|
57 |
+
## 🙏🏻 Acknowledgements
|
58 |
+
|
59 |
+
We would like to thank the contributors to the [LivePortrait](https://github.com/KwaiVGI/LivePortrait), and [echomimic](https://github.com/antgroup/echomimic),[JoyVasa](https://github.com/jdh-algo/JoyVASA/),[Ditto](https://github.com/antgroup/ditto-talkinghead/), [Open Facevid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis), [InsightFace](https://github.com/deepinsight/insightface), [X-Pose](https://github.com/IDEA-Research/X-Pose), [DiffPoseTalk](https://github.com/DiffPoseTalk/DiffPoseTalk), [Hallo](https://github.com/fudan-generative-vision/hallo), [wav2vec 2.0](https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec), [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain), [Q-Align](https://github.com/Q-Future/Q-Align), [Syncnet](https://github.com/joonson/syncnet_python), and [VBench](https://github.com/Vchitect/VBench) repositories, for their open research and extraordinary work.
|
60 |
+
If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
|
61 |
+
## 📑 Citation
|
62 |
+
|
63 |
+
If you use MoDA in your research, please cite:
|
64 |
+
|
65 |
+
```bibtex
|
66 |
+
@article{li2025moda,
|
67 |
+
title={MoDA: Multi-modal Diffusion Architecture for Talking Head Generation},
|
68 |
+
author={Li, Xinyang and Li, Gen and Lin, Zhihui and Qian, Yichen and Yao, GongXin and Jia, Weinan and Chen, Weihua and Wang, Fan},
|
69 |
+
journal={arXiv preprint arXiv:2507.03256},
|
70 |
+
year={2025}
|
71 |
+
}
|
72 |
+
```
|