Aayan Mishra's picture

Aayan Mishra

Spestly

AI & ML interests

None yet

Recent Activity

Organizations

Stanford AI's profile picture OpenVINO Toolkit's profile picture LLMs's profile picture C4AI Community's profile picture Hugging Face Discord Community's profile picture Data Tonic (Alignment Lab)'s profile picture Odyssey Labs's profile picture Open-Neo's profile picture Lambda Go Labs's profile picture

Spestly's activity

published a model about 20 hours ago
reacted to hexgrad's post with πŸ”₯ about 23 hours ago
view post
Post
5567
I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.
upvoted an article 1 day ago
New activity in open-neo/Kyro-n1-3B 8 days ago
upvoted an article 8 days ago
view article
Article

Welcome to Inference Providers on the Hub πŸ”₯

β€’ 384