Does this model support streaming ASR recognition, or are there any plans to open-source a streaming model?

#4
by Qoboty - opened

Currently, there are so many open-source non-streaming ASR models available. Are there any open-source streaming ASR models, or are there any related plans?

I think Voxtral would work pretty well behind an API which supports streaming inputs (requires chunking/VAD of course, because Voxtral doesn't have any streaming inputs built-in). I made a small demo which can process an 31 minute audio file in 60 seconds on an RTX 3090:

https://github.com/coezbek/voxtral-test/README.md#vad-based-streaming

For streaming inputs there was recently Kyutai STT announced: https://kyutai.org/next/tts

Sign up or log in to comment