mistralai/Voxtral-Mini-3B-2507 · How to use automatic language recognition?

owao

26 days ago

I couldn't find a way to set the language to auto or None. I'm always getting Invalid language alpha2 code [type=language_alpha2, input_value='automatic', input_type=str]
And language is a required positional arg, so how is it meant to be used?

Thanks by advance

sharrnah

24 days ago

Same issue here.
automatic language detection is mentioned but all libraries and example codes expect a defined language.

sharrnah

24 days ago

•

edited 24 days ago

I think i found a workaround.

# patching optional language field
from typing import Optional
from pydantic_extra_types.language_code import LanguageAlpha2
from mistral_common.protocol.transcription.request import TranscriptionRequest as _TR

class TranscriptionRequest(_TR):
    # make it optional
    language: Optional[LanguageAlpha2] = None



# for transcribing, use the mistral_common helper directly:
repo_id = "mistralai/Voxtral-Mini-3B-2507"

openai_req = {
    "model": repo_id,
    "file":  wav_buffer, # has to be a path or io.BytesIO
    # "language": 'en'   # This is now optional. leave out for auto detection.
}
tr = TranscriptionRequest.from_openai(openai_req)

tok = processor.tokenizer.tokenizer.encode_transcription(tr)
audio_feats = processor.feature_extractor(
    wav_buffer, sampling_rate=16000, return_tensors="pt"
).input_features.to(model.device)

with torch.no_grad():
    ids = model.generate(
        input_features=audio_feats,
        input_ids     = torch.tensor([tok.tokens], device=model.device),
        max_new_tokens=500,
        num_beams=1
    )
response = processor.batch_decode(ids, skip_special_tokens=True)[0]

owao

23 days ago

Thanks for sharing your solution @sharrnah ! I'll give it a try soon and report back!

pdrolet

14 days ago

This solution worked! In fact, I gave it to Claude.ai and "he" integrated it into the script.