VibeVoice-1.5B
VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.
Repository
Hugging Face model page: technicalheist/vibevoice-1.5b
Requirements
- Python 3.8+
- PyTorch (with CUDA support recommended)
- Transformers
- FFmpeg (for audio processing)
Installation
Clone the repository and install dependencies:
# Clone the repository
!git clone https://huggingface.co/technicalheist/vibevoice-1.5b
# Change directory
%cd /content/vibevoice-1.5b
# Install in editable mode
!pip install -e .
# Install ffmpeg for audio handling
!apt update && apt install ffmpeg -y
Usage
Run inference using the provided demo script:
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
--model_path /content/vibevoice-1.5b \
--txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
--speaker_names Alice
Arguments
--model_path
: Path to the model directory (local or Hugging Face repo name).--txt_path
: Path to a text file containing the input text.--speaker_names
: Names of the speakers to be used for synthesis (multiple speakers supported).
Example with multiple speakers
!python /content/vibevoice-1.5b/demo/inference_from_file.py \
--model_path /content/vibevoice-1.5b \
--txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
--speaker_names Alice Frank
Google Colab Notebook
A ready-to-use Google Colab notebook is available for quick experimentation:
Output
- Generated audio files will be saved in the output directory specified in the script.
- Default output format:
.wav
License
Check the license terms on the model page before use.
- Downloads last month
- 1