chendelong commited on
Commit
f6523e1
·
1 Parent(s): 647849e

Create README.md

Browse files

<div align="center">

## 🎙 [Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model](https://huggingface.co/papers/2309.11000)

[Xinyu Zhou (周欣宇)](https://www.linkedin.com/in/xinyu-zhou2000/),     [Delong Chen (陈德龙)](https://chendelong.world/),     [Yudong Chen (陈玉东)](https://rwxy.cuc.edu.cn/2019/0730/c5134a133504/pagem.htm)

[ArXiv](https://arxiv.org/abs/2309.11000) | [Poster](doc/YFRSW_Poster.pdf) | [Notebook](prosody_prediction.ipynb) | [Github](https://github.com/XinyuZhou2000/Spoken-Dialogue)

</div>

This project explores the potential of constructing an AI spoken dialogue system that *"thinks how to respond"* and *"thinks how to speak"* simultaneously, which more closely aligns with the human speech production process compared to the current cascade pipeline of independent chatbot and Text-to-Speech (TTS) modules.

We hypothesize that *Large Language Models (LLMs)* with billions of parameters possess significant speech understanding capabilities and can jointly model dialogue responses and linguistic features. We investigate the task of Prosodic structure prediction (PSP), a typical front-end task in TTS, demonstrating the speech understanding ability of LLMs.

Files changed (1) hide show
  1. README.md +0 -0
README.md ADDED
File without changes