Does this model support streaming ASR recognition, or are there any plans to open-source a streaming model?
#4
by
Qoboty
- opened
Currently, there are so many open-source non-streaming ASR models available. Are there any open-source streaming ASR models, or are there any related plans?
I think Voxtral would work pretty well behind an API which supports streaming inputs (requires chunking/VAD of course, because Voxtral doesn't have any streaming inputs built-in). I made a small demo which can process an 31 minute audio file in 60 seconds on an RTX 3090:
https://github.com/coezbek/voxtral-test/README.md#vad-based-streaming
For streaming inputs there was recently Kyutai STT announced: https://kyutai.org/next/tts