--- license: mit tags: - text-to-speech - audio - speech language: - en pipeline_tag: text-to-speech model-index: - name: VibeVoice-1.5B results: [] --- # VibeVoice-1.5B VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints. ## Repository Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/) ## Requirements * Python 3.8+ * PyTorch (with CUDA support recommended) * [Transformers](https://github.com/huggingface/transformers) * FFmpeg (for audio processing) ## Installation Clone the repository and install dependencies: ```bash # Clone the repository !git clone https://huggingface.co/technicalheist/vibevoice-1.5b # Change directory %cd /content/vibevoice-1.5b # Install in editable mode !pip install -e . # Install ffmpeg for audio handling !apt update && apt install ffmpeg -y ``` ## Usage Run inference using the provided demo script: ```bash !python /content/vibevoice-1.5b/demo/inference_from_file.py \ --model_path /content/vibevoice-1.5b \ --txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \ --speaker_names Alice ``` ### Arguments * `--model_path`: Path to the model directory (local or Hugging Face repo name). * `--txt_path`: Path to a text file containing the input text. * `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported). ### Example with multiple speakers ```bash !python /content/vibevoice-1.5b/demo/inference_from_file.py \ --model_path /content/vibevoice-1.5b \ --txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \ --speaker_names Alice Frank ``` ## Google Colab Notebook A ready-to-use Google Colab notebook is available for quick experimentation: [Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing) ## Output * Generated audio files will be saved in the output directory specified in the script. * Default output format: `.wav` ## License Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use.