VincentGOURBIN/voxtral-small-4bit-mixed

This is a 4-bit quantized version of the mistralai/Voxtral-Small-24B-2507 language model.
It is provided in standard Hugging Face Transformers format and compatible with mlx.voxtral.

🔧 About this model

  • Base model: mistralai/Voxtral-Small-24B-2507
  • Quantization: 4-bit mixed precision
  • Format: Transformers-compatible (safetensors), usable with MLX and Hugging Face

🙏 Acknowledgments

Huge thanks to:

  • Mistral AI for releasing the original Voxtral-Small model
  • mlx-voxtral for the quantization tooling and MLX support

This work is a quantized derivative of mistralai/Voxtral-Small-24B-2507, made easier by the amazing work of the voxtral project.

🚀 Usage

🤗 With Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "VincentGOURBIN/voxtral-small-4bit-mixed"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
117
Safetensors
Model size
24.9B params
Tensor type
F16
·
U32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VincentGOURBIN/voxtral-small-4bit-mixed

Quantized
(6)
this model