Update README.md
#1
by
yanchaomars
- opened
README.md
CHANGED
@@ -1,3 +1,6 @@
|
|
|
|
|
|
|
|
1 |
# Step-Audio-TTS-3B
|
2 |
|
3 |
|
@@ -6,7 +9,7 @@ Step-Audio-TTS-3B 是业界首个基于大规模合成数据和LLM-Chat范式训
|
|
6 |
Step-Audio-TTS-3B represents the industry's first Text-to-Speech (TTS) model trained on a large-scale synthetic dataset utilizing the LLM-Chat paradigm. It has achieved SOTA Character Error Rate (CER) results on the SEED TTS Eval benchmark. The model supports multiple languages, a variety of emotional expressions, and diverse voice style controls. Notably, Step-Audio-TTS-3B is also the first TTS model in the industry capable of generating RAP and Humming, marking a significant advancement in the field of speech synthesis.
|
7 |
|
8 |
|
9 |
-
|
10 |
|
11 |
This repository provides the model weights for StepAudio-TTS-3B, which is a dual-codebook trained LLM (Large Language Model) for text-to-speech synthesis. Additionally, it includes a vocoder trained using the dual-codebook approach, as well as a specialized vocoder specifically optimized for humming generation. These resources collectively enable high-quality speech synthesis and humming capabilities, leveraging the advanced dual-codebook training methodology.
|
12 |
|
|
|
1 |
+
---
|
2 |
+
{}
|
3 |
+
---
|
4 |
# Step-Audio-TTS-3B
|
5 |
|
6 |
|
|
|
9 |
Step-Audio-TTS-3B represents the industry's first Text-to-Speech (TTS) model trained on a large-scale synthetic dataset utilizing the LLM-Chat paradigm. It has achieved SOTA Character Error Rate (CER) results on the SEED TTS Eval benchmark. The model supports multiple languages, a variety of emotional expressions, and diverse voice style controls. Notably, Step-Audio-TTS-3B is also the first TTS model in the industry capable of generating RAP and Humming, marking a significant advancement in the field of speech synthesis.
|
10 |
|
11 |
|
12 |
+
本仓库提供采用dual-codebook训练的StepAudio-TTS-3B LLM 模型权重,基于dual-codebook训练的vocoder,以及为哼唱专门训练的vocoder。
|
13 |
|
14 |
This repository provides the model weights for StepAudio-TTS-3B, which is a dual-codebook trained LLM (Large Language Model) for text-to-speech synthesis. Additionally, it includes a vocoder trained using the dual-codebook approach, as well as a specialized vocoder specifically optimized for humming generation. These resources collectively enable high-quality speech synthesis and humming capabilities, leveraging the advanced dual-codebook training methodology.
|
15 |
|