mrfakename commited on
Commit
505114a
·
verified ·
1 Parent(s): 7f9a3a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -9
README.md CHANGED
@@ -16,15 +16,6 @@ https://github.com/vibevoice-community/VibeVoice
16
 
17
  ## VibeVoice: A Frontier Open-Source Text-to-Speech Model
18
 
19
- > This repository contains a copy of model weights obtained from ModelScope([microsoft/VibeVoice-7B](https://www.modelscope.cn/models/microsoft/VibeVoice-7B)).
20
- > The license for this model is the `MIT License`, **which permits redistribution**.
21
- >
22
- > My understanding of the MIT License, which is consistent with the broader open-source community's consensus,
23
- > is that it grants the right to distribute copies of the software and its derivatives.
24
- > Therefore, I am lawfully exercising the right to redistribute this model.
25
- >
26
- > If you are a rights holder and believe this understanding of the license is incorrect, please submit a DMCA complaint to Hugging Face at [email protected]_
27
-
28
  VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.
29
 
30
  A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details.
 
16
 
17
  ## VibeVoice: A Frontier Open-Source Text-to-Speech Model
18
 
 
 
 
 
 
 
 
 
 
19
  VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.
20
 
21
  A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details.