CosyVoice2-0.5B-EU — FR/DE Zero-Shot Voice Cloning (CosyVoice2)

Europeanized CosyVoice2 for French & German.
Plug-and-play zero-shot voice cloning with streaming support, bilingual training (FR+DE), and a simple CLI via the companion PyPI package.

👉 PyPI: cosyvoice2-eu (current: 0.2.7) at https://pypi.org/project/cosyvoice2-eu/
👉 Demo: https://horstmann.tech/cosyvoice2-demo/
👉 Built on: FunAudioLLM CosyVoice2 (semantic LM + chunk-aware flow + HiFi-GAN)

TL;DR

High-quality French/German zero-shot TTS (text + short reference audio) built on CosyVoice2. Optimized for sentence-to-paragraph narration, bilingual FR+DE adaptation, and easy local inference. While this model is optimized for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.

Quickstart (CLI)

Install:

pip install cosyvoice2-eu

French example:

cosy2-eu   --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé."   --prompt path/to/french_ref.wav   --out out_fr.wav

German example:

cosy2-eu   --text "Hallo! Ich präsentiere CosyVoice 2 – ein fortschrittliches TTS-System."   --prompt path/to/german_ref.wav   --out out_de.wav

First run downloads the model from this repo and caches it locally.
Tip: You can experiment with prompts for style control using "<style>. <|endofprompt|> <text>", e.g., "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute?"

What you get

Zero-shot voice cloning for FR/DE (reference audio → cloned timbre & style).
Bilingual adaptation (FR+DE) on top of CosyVoice2 for stronger data efficiency. While this model adds support for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.
Streaming & non-streaming synthesis supported by the underlying architecture.
Simple local inference: one pip install, one CLI (cosy2-eu).
Interoperable components (text→semantic LM, flow decoder, HiFi-GAN vocoder).

Also compatible with original CosyVoice2 languages (EN/ZH/JA/KO & dialects).

Inputs / Outputs

Input: text (FR/DE) + short reference audio (mono WAV recommended).
Output: synthesized WAV cloning the reference speaker’s timbre, speaking the input text in FR/DE.

Notes & limitations

FR/DE were adapted under constrained open-data budgets; extreme edge cases (very noisy prompts, long numerics, heavy code-switching) may require careful prompting or additional fine-tuning.
Voice cloning carries misuse risks (impersonation, fraud). Use only with consent and follow local laws/policies.

License & attribution

License: Apache-2.0 (see card metadata / repo).
Built on CosyVoice2 by FunAudioLLM; please cite their work (see below).

Links

PyPI (inference CLI): https://pypi.org/project/cosyvoice2-eu/
Upstream project: https://github.com/FunAudioLLM/CosyVoice
CosyVoice2 paper & page: https://arxiv.org/abs/2412.10117 • https://funaudiollm.github.io/cosyvoice2/

If you use CosyVoice2-0.5B-EU in research or products, please add a short acknowledgment and share feedback or samples—we’re continuously improving FR/DE expressiveness and robustness.

Luka512
/

CosyVoice2-0.5B-EU