SpeechT5 TTS

This is a re-upload of the Microsoft/SpeechT5_TTS model.

Model description

SpeechT5 is a unified-modal speech and text model developed by Microsoft. This specific model is fine-tuned for text-to-speech tasks.

Usage

from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset
import torch
import soundfile as sf

processor = SpeechT5Processor.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
model = SpeechT5ForTextToSpeech.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

inputs = processor(text="Hello, how are you?", return_tensors="pt")

embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

sf.write("speech.wav", speech.numpy(), samplerate=16000)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train xiaozhongabc/my-speecht5-tts