|
--- |
|
license: mit |
|
tags: |
|
- text-to-speech |
|
- audio |
|
- speech |
|
language: |
|
- en |
|
pipeline_tag: text-to-speech |
|
model-index: |
|
- name: VibeVoice-1.5B |
|
results: [] |
|
--- |
|
|
|
|
|
# VibeVoice-1.5B |
|
|
|
VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints. |
|
|
|
## Repository |
|
|
|
Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/) |
|
|
|
## Requirements |
|
|
|
* Python 3.8+ |
|
* PyTorch (with CUDA support recommended) |
|
* [Transformers](https://github.com/huggingface/transformers) |
|
* FFmpeg (for audio processing) |
|
|
|
## Installation |
|
|
|
Clone the repository and install dependencies: |
|
|
|
```bash |
|
# Clone the repository |
|
!git clone https://huggingface.co/technicalheist/vibevoice-1.5b |
|
|
|
# Change directory |
|
%cd /content/vibevoice-1.5b |
|
|
|
# Install in editable mode |
|
!pip install -e . |
|
|
|
# Install ffmpeg for audio handling |
|
!apt update && apt install ffmpeg -y |
|
``` |
|
|
|
## Usage |
|
|
|
Run inference using the provided demo script: |
|
|
|
```bash |
|
!python /content/vibevoice-1.5b/demo/inference_from_file.py \ |
|
--model_path /content/vibevoice-1.5b \ |
|
--txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \ |
|
--speaker_names Alice |
|
``` |
|
|
|
### Arguments |
|
|
|
* `--model_path`: Path to the model directory (local or Hugging Face repo name). |
|
* `--txt_path`: Path to a text file containing the input text. |
|
* `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported). |
|
|
|
### Example with multiple speakers |
|
|
|
```bash |
|
!python /content/vibevoice-1.5b/demo/inference_from_file.py \ |
|
--model_path /content/vibevoice-1.5b \ |
|
--txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \ |
|
--speaker_names Alice Frank |
|
``` |
|
|
|
## Google Colab Notebook |
|
|
|
A ready-to-use Google Colab notebook is available for quick experimentation: |
|
|
|
[Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing) |
|
|
|
## Output |
|
|
|
* Generated audio files will be saved in the output directory specified in the script. |
|
* Default output format: `.wav` |
|
|
|
## License |
|
|
|
Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use. |
|
|
|
|