CosyVoice2-0.5B-EU — FR/DE Zero-Shot Voice Cloning (CosyVoice2)
Europeanized CosyVoice2 for French & German.
Plug-and-play zero-shot voice cloning with streaming support, bilingual training (FR+DE), and a simple CLI via the companion PyPI package.
👉 PyPI: cosyvoice2-eu
(current: 0.2.7) at https://pypi.org/project/cosyvoice2-eu/
👉 Demo: https://horstmann.tech/cosyvoice2-demo/
👉 Built on: FunAudioLLM CosyVoice2 (semantic LM + chunk-aware flow + HiFi-GAN)
TL;DR
High-quality French/German zero-shot TTS (text + short reference audio) built on CosyVoice2. Optimized for sentence-to-paragraph narration, bilingual FR+DE adaptation, and easy local inference. While this model is optimized for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.
Quickstart (CLI)
Install:
pip install cosyvoice2-eu
French example:
cosy2-eu --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé." --prompt path/to/french_ref.wav --out out_fr.wav
German example:
cosy2-eu --text "Hallo! Ich präsentiere CosyVoice 2 – ein fortschrittliches TTS-System." --prompt path/to/german_ref.wav --out out_de.wav
First run downloads the model from this repo and caches it locally.
Tip: You can experiment with prompts for style control using"<style>. <|endofprompt|> <text>"
, e.g., "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute?"
What you get
- Zero-shot voice cloning for FR/DE (reference audio → cloned timbre & style).
- Bilingual adaptation (FR+DE) on top of CosyVoice2 for stronger data efficiency. While this model adds support for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.
- Streaming & non-streaming synthesis supported by the underlying architecture.
- Simple local inference: one pip install, one CLI (
cosy2-eu
). - Interoperable components (text→semantic LM, flow decoder, HiFi-GAN vocoder).
Also compatible with original CosyVoice2 languages (EN/ZH/JA/KO & dialects).
Inputs / Outputs
- Input: text (FR/DE) + short reference audio (mono WAV recommended).
- Output: synthesized WAV cloning the reference speaker’s timbre, speaking the input text in FR/DE.
Notes & limitations
- FR/DE were adapted under constrained open-data budgets; extreme edge cases (very noisy prompts, long numerics, heavy code-switching) may require careful prompting or additional fine-tuning.
- Voice cloning carries misuse risks (impersonation, fraud). Use only with consent and follow local laws/policies.
License & attribution
- License: Apache-2.0 (see card metadata / repo).
- Built on CosyVoice2 by FunAudioLLM; please cite their work (see below).
Links
- PyPI (inference CLI): https://pypi.org/project/cosyvoice2-eu/
- Upstream project: https://github.com/FunAudioLLM/CosyVoice
- CosyVoice2 paper & page: https://arxiv.org/abs/2412.10117 • https://funaudiollm.github.io/cosyvoice2/
If you use CosyVoice2-0.5B-EU in research or products, please add a short acknowledgment and share feedback or samples—we’re continuously improving FR/DE expressiveness and robustness.
Model tree for Luka512/CosyVoice2-0.5B-EU
Base model
FunAudioLLM/CosyVoice2-0.5B