VibeVoice-1.5B

VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.

Repository

Hugging Face model page: technicalheist/vibevoice-1.5b

Requirements

  • Python 3.8+
  • PyTorch (with CUDA support recommended)
  • Transformers
  • FFmpeg (for audio processing)

Installation

Clone the repository and install dependencies:

# Clone the repository
!git clone https://huggingface.co/technicalheist/vibevoice-1.5b

# Change directory
%cd /content/vibevoice-1.5b

# Install in editable mode
!pip install -e .

# Install ffmpeg for audio handling
!apt update && apt install ffmpeg -y

Usage

Run inference using the provided demo script:

!python /content/vibevoice-1.5b/demo/inference_from_file.py \
  --model_path /content/vibevoice-1.5b \
  --txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
  --speaker_names Alice

Arguments

  • --model_path: Path to the model directory (local or Hugging Face repo name).
  • --txt_path: Path to a text file containing the input text.
  • --speaker_names: Names of the speakers to be used for synthesis (multiple speakers supported).

Example with multiple speakers

!python /content/vibevoice-1.5b/demo/inference_from_file.py \
  --model_path /content/vibevoice-1.5b \
  --txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
  --speaker_names Alice Frank

Google Colab Notebook

A ready-to-use Google Colab notebook is available for quick experimentation:

Open in Colab

Output

  • Generated audio files will be saved in the output directory specified in the script.
  • Default output format: .wav

License

Check the license terms on the model page before use.

Downloads last month
1
Safetensors
Model size
2.7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support