I created an easy-to-use API server with Web UI and GPU support for KittenTTS

#2
by devnen - opened

The KittenTTS model is truly impressive work by the KittenML team. The quality is outstanding for a model that's less than 25MB.

Seeing the potential for such a lightweight model, I wanted to share a project I built that makes it incredibly easy to run locally:
https://github.com/devnen/Kitten-TTS-Server

It's an enhanced FastAPI server that wraps the KittenTTS model and adds several features the original project doesn't have, like a UI and GPU acceleration. Setup is designed to be as simple as possible with a standard pip install that works on both Windows and Linux.

screenshot-d.png

The main goal was to provide a "just works" experience. You run one command to install, then python server.py. The server automatically downloads this model, starts, and opens a web UI in your browser where you can immediately start generating audio.

Key enhancements include:
True GPU Acceleration: I've added a high-performance pipeline for NVIDIA GPUs using I/O Binding, making inference significantly faster. The installation guide provides a hassle-free method to get all the CUDA dependencies set up correctly.

Modern Web UI: A clean interface to type text, choose from the 8 built-in voices, and generate speech instantly.

Audiobook Generation: It includes intelligent text chunking, allowing you to process very long documents or even entire books into a single audio file.

Dual API: An OpenAI-compatible /v1/audio/speech endpoint is included for easy integration, alongside a custom /tts endpoint for more control.

Docker Support: Pre-configured Docker Compose files are available for both CPU and NVIDIA GPU deployment.

This project aims to make this fantastic lightweight model accessible to everyone, without the usual setup friction.

I hope you find it useful:
https://github.com/devnen/Kitten-TTS-Server

Sign up or log in to comment