Does this model support streaming ASR recognition, or are there any plans to open-source a streaming model?

by Qoboty - opened Jul 16

Jul 16

Currently, there are so many open-source non-streaming ASR models available. Are there any open-source streaming ASR models, or are there any related plans?

oezi13

Jul 16

I think Voxtral would work pretty well behind an API which supports streaming inputs (requires chunking/VAD of course, because Voxtral doesn't have any streaming inputs built-in). I made a small demo which can process an 31 minute audio file in 60 seconds on an RTX 3090:

https://github.com/coezbek/voxtral-test/README.md#vad-based-streaming

For streaming inputs there was recently Kyutai STT announced: https://kyutai.org/next/tts

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment