lixinyizju commited on
Commit
bb5baf5
·
verified ·
1 Parent(s): e80a6db

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 align='center'>MoDA: Multi-modal Diffusion Architecture for Talking Head Generation</h1>
2
+
3
+ <div align="center">
4
+
5
+ <strong>Authors</strong> <br><br>
6
+
7
+ Xinyang&nbsp;Li<sup>1,2</sup>,&nbsp;
8
+ Gen&nbsp;Li<sup>2</sup>,&nbsp;
9
+ Zhihui&nbsp;Lin<sup>1,3</sup>,&nbsp;
10
+ Yichen&nbsp;Qian<sup>1,3&nbsp;†</sup>,&nbsp;
11
+ Gongxin&nbsp;Yao<sup>2</sup>,&nbsp;
12
+ Weinan&nbsp;Jia<sup>1</sup>,&nbsp;
13
+ Aowen&nbsp;Wang<sup>1</sup>,&nbsp;
14
+ Weihua&nbsp;Chen<sup>1,3</sup>,&nbsp;
15
+ Fan&nbsp;Wang<sup>1,3</sup> <br><br>
16
+
17
+ <sup>1</sup>Xunguang&nbsp;Team,&nbsp;DAMO&nbsp;Academy,&nbsp;Alibaba&nbsp;Group&nbsp;&nbsp;&nbsp;
18
+ <sup>2</sup>Zhejiang&nbsp;University&nbsp;&nbsp;&nbsp;
19
+ <sup>3</sup>Hupan&nbsp;Lab <br><br>
20
+
21
+ <sup>†</sup>Corresponding authors: [email protected],&nbsp;[email protected]
22
+
23
+ </div>
24
+ <br>
25
+ <div align='center'>
26
+ <a href='https://lixinyyang.github.io/MoDA.github.io/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
27
+ <a href='https://arxiv.org/abs/2507.03256'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
28
+ </div>
29
+
30
+ ## 📂 Updates
31
+
32
+ * [2025.08.08] 🔥 We release our inference [codes](https://github.com/lixinyyang/MoDA/) and [models](https://huggingface.co/lixinyizju/moda/).
33
+
34
+ ## ⚙️ Installation
35
+
36
+ **Create environment:**
37
+
38
+ ```bash
39
+ # 1. Create base environment
40
+ conda create -n moda python=3.10 -y
41
+ conda activate moda
42
+
43
+ # 2. Install requirements
44
+ pip install -r requirements.txt
45
+
46
+ # 3. Install ffmpeg
47
+ sudo apt-get update
48
+ sudo apt-get install ffmpeg -y
49
+ ```
50
+ ## &#x1F680; Inference
51
+ ```python
52
+ python src/models/inference/moda_test.py --image_path src/examples/reference_images/6.jpg --audio_path src/examples/driving_audios/5.wav
53
+ ```
54
+ ## ⚖️ Disclaimer
55
+ This project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.
56
+
57
+ ## 🙏🏻 Acknowledgements
58
+
59
+ We would like to thank the contributors to the [LivePortrait](https://github.com/KwaiVGI/LivePortrait), and [echomimic](https://github.com/antgroup/echomimic),[JoyVasa](https://github.com/jdh-algo/JoyVASA/),[Ditto](https://github.com/antgroup/ditto-talkinghead/), [Open Facevid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis), [InsightFace](https://github.com/deepinsight/insightface), [X-Pose](https://github.com/IDEA-Research/X-Pose), [DiffPoseTalk](https://github.com/DiffPoseTalk/DiffPoseTalk), [Hallo](https://github.com/fudan-generative-vision/hallo), [wav2vec 2.0](https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec), [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain), [Q-Align](https://github.com/Q-Future/Q-Align), [Syncnet](https://github.com/joonson/syncnet_python), and [VBench](https://github.com/Vchitect/VBench) repositories, for their open research and extraordinary work.
60
+ If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
61
+ ## 📑 Citation
62
+
63
+ If you use MoDA in your research, please cite:
64
+
65
+ ```bibtex
66
+ @article{li2025moda,
67
+ title={MoDA: Multi-modal Diffusion Architecture for Talking Head Generation},
68
+ author={Li, Xinyang and Li, Gen and Lin, Zhihui and Qian, Yichen and Yao, GongXin and Jia, Weinan and Chen, Weihua and Wang, Fan},
69
+ journal={arXiv preprint arXiv:2507.03256},
70
+ year={2025}
71
+ }
72
+ ```