CosyVoice2-EU logo

CosyVoice2-0.5B-EU — FR/DE Zero-Shot Voice Cloning (CosyVoice2)

Europeanized CosyVoice2 for French & German.
Plug-and-play zero-shot voice cloning with streaming support, bilingual training (FR+DE), and a simple CLI via the companion PyPI package.

👉 PyPI: cosyvoice2-eu (current: 0.2.7) at https://pypi.org/project/cosyvoice2-eu/
👉 Demo: https://horstmann.tech/cosyvoice2-demo/
👉 Built on: FunAudioLLM CosyVoice2 (semantic LM + chunk-aware flow + HiFi-GAN)


TL;DR

High-quality French/German zero-shot TTS (text + short reference audio) built on CosyVoice2. Optimized for sentence-to-paragraph narration, bilingual FR+DE adaptation, and easy local inference. While this model is optimized for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.


Quickstart (CLI)

Install:

pip install cosyvoice2-eu

French example:

cosy2-eu   --text "Salut ! Je vous présente CosyVoice 2, un système de synthèse vocale très avancé."   --prompt path/to/french_ref.wav   --out out_fr.wav

German example:

cosy2-eu   --text "Hallo! Ich präsentiere CosyVoice 2 – ein fortschrittliches TTS-System."   --prompt path/to/german_ref.wav   --out out_de.wav

First run downloads the model from this repo and caches it locally.
Tip: You can experiment with prompts for style control using "<style>. <|endofprompt|> <text>", e.g., "Speak cheerfully. <|endofprompt|> Hallo! Wie geht es Ihnen heute?"


What you get

  • Zero-shot voice cloning for FR/DE (reference audio → cloned timbre & style).
  • Bilingual adaptation (FR+DE) on top of CosyVoice2 for stronger data efficiency. While this model adds support for French and German, it remains fully compatible with the original CosyVoice2 languages — English, Chinese, Japanese, Korean, and their dialects.
  • Streaming & non-streaming synthesis supported by the underlying architecture.
  • Simple local inference: one pip install, one CLI (cosy2-eu).
  • Interoperable components (text→semantic LM, flow decoder, HiFi-GAN vocoder).

Also compatible with original CosyVoice2 languages (EN/ZH/JA/KO & dialects).


Inputs / Outputs

  • Input: text (FR/DE) + short reference audio (mono WAV recommended).
  • Output: synthesized WAV cloning the reference speaker’s timbre, speaking the input text in FR/DE.

Notes & limitations

  • FR/DE were adapted under constrained open-data budgets; extreme edge cases (very noisy prompts, long numerics, heavy code-switching) may require careful prompting or additional fine-tuning.
  • Voice cloning carries misuse risks (impersonation, fraud). Use only with consent and follow local laws/policies.

License & attribution

  • License: Apache-2.0 (see card metadata / repo).
  • Built on CosyVoice2 by FunAudioLLM; please cite their work (see below).

Links


If you use CosyVoice2-0.5B-EU in research or products, please add a short acknowledgment and share feedback or samples—we’re continuously improving FR/DE expressiveness and robustness.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Luka512/CosyVoice2-0.5B-EU

Quantized
(3)
this model