Mistral-Small-3.2-24B-Instruct-2506 (Quantized)

This is a quantized version of togethercomputer/mistral-3.2-instruct-2506, optimized for reduced memory usage while maintaining performance.

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Quantization Details

This model has been quantized to reduce memory requirements while preserving model quality. The quantization reduces the model size significantly compared to the original fp16/bf16 version.

Base Model Improvements

Small-3.2 improves in the following categories:

Instruction following: Small-3.2 is better at following precise instructions
Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
Function calling: Small-3.2's function calling template is more robust

In all other categories Small-3.2 should match or slightly improve compared to Mistral-Small-3.1-24B-Instruct-2503.

Key Features

same as Mistral-Small-3.1-24B-Instruct-2503
Reduced memory footprint through quantization
Optimized for inference with maintained quality

Usage

The quantized model can be used with the following frameworks;

Note 1: We recommend using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailor it to your needs.

Memory Requirements

This quantized version requires significantly less GPU memory than the original model:

Original: ~55 GB of GPU RAM in bf16 or fp16
Quantized: Reduced memory footprint (exact requirements depend on quantization method used)

License

This model inherits the same license as the base model: Apache-2.0

Original Model

For benchmark results and detailed usage examples, please refer to the original model: togethercomputer/mistral-3.2-instruct-2506

vschandramourya
/

mistral-3.2-instruct-2506-quantized

You need to agree to share your contact information to access this model