technicalheist
/

vibevoice-1.5b

Model card Files Files and versions

vibevoice-1.5b / README.md

technicalheist's picture

Update README.md

3de0fd6 verified 7 days ago

|

history blame contribute delete

2.23 kB

	---
	license: mit
	tags:
	- text-to-speech
	- audio
	- speech
	language:
	- en
	pipeline_tag: text-to-speech
	model-index:
	- name: VibeVoice-1.5B
	results: []
	---


	# VibeVoice-1.5B

	VibeVoice-1.5B is a text-to-speech (TTS) model hosted on Hugging Face. This repository provides scripts and examples to synthesize speech from text using pre-trained checkpoints.

	## Repository

	Hugging Face model page: [technicalheist/vibevoice-1.5b](https://huggingface.co/technicalheist/vibevoice-1.5b/)

	## Requirements

	* Python 3.8+
	* PyTorch (with CUDA support recommended)
	* [Transformers](https://github.com/huggingface/transformers)
	* FFmpeg (for audio processing)

	## Installation

	Clone the repository and install dependencies:

	```bash
	# Clone the repository
	!git clone https://huggingface.co/technicalheist/vibevoice-1.5b

	# Change directory
	%cd /content/vibevoice-1.5b

	# Install in editable mode
	!pip install -e .

	# Install ffmpeg for audio handling
	!apt update && apt install ffmpeg -y
	```

	## Usage

	Run inference using the provided demo script:

	```bash
	!python /content/vibevoice-1.5b/demo/inference_from_file.py \
	--model_path /content/vibevoice-1.5b \
	--txt_path /content/vibevoice-1.5b/demo/text_examples/1p_abs.txt \
	--speaker_names Alice
	```

	### Arguments

	* `--model_path`: Path to the model directory (local or Hugging Face repo name).
	* `--txt_path`: Path to a text file containing the input text.
	* `--speaker_names`: Names of the speakers to be used for synthesis (multiple speakers supported).

	### Example with multiple speakers

	```bash
	!python /content/vibevoice-1.5b/demo/inference_from_file.py \
	--model_path /content/vibevoice-1.5b \
	--txt_path /content/vibevoice-1.5b/demo/text_examples/2p_music.txt \
	--speaker_names Alice Frank
	```

	## Google Colab Notebook

	A ready-to-use Google Colab notebook is available for quick experimentation:

	[Open in Colab](https://colab.research.google.com/drive/1KAswi0RLdXq-CouJDlzzXcD2K5XcySt1?usp=sharing)

	## Output

	* Generated audio files will be saved in the output directory specified in the script.
	* Default output format: `.wav`

	## License

	Check the license terms on the [model page](https://huggingface.co/technicalheist/vibevoice-1.5b/) before use.