--- base_model: - mistralai/Voxtral-Small-24B-2507 - mistralai/Mistral-Small-3.2-24B-Instruct-2506 license: apache-2.0 pipeline_tag: any-to-any --- # Home-cooked Mistral Small Omni This is a multimodal model created by merging Mistral Small 2506 (with vision capabilities) and Voxtral 2507 (with audio capabilities) using a modified version of the `mergekit` tool. For detailed merging instructions, refer to the sections below. ## License and Attribution This model is a merged derivative work combining Mistral Small 2506 and Voxtral 2507, both originally released by Mistral AI under the Apache 2.0 license. The merged model is also distributed under the Apache 2.0 license, and the full license text, along with original copyright notices, is included in this repository. I have no affiliation, sponsorship, or formal relationship with Mistral AI. This project is an independent effort to combine the vision and audio capabilities of the two models. ## Steps to reproduce ### Merge text model Install `mergekit` from this version: https://github.com/arcee-ai/mergekit/tree/0027c5c51471fa891d438eccda5455ebe55b536e Modify the `mergekit` source code, open file `mergekit/merge_methods/generalized_task_arithmetic.py` ```py # Normalize the vectors to get the directions and angles v0 = normalize(v0, eps) v1 = normalize(v1, eps) if v0.shape != v1.shape: # ADD THIS res = np.array([0.0]) # ADD THIS return maybe_torch(res, is_torch) # ADD THIS # Dot product with the normalized vectors (can't use np.dot in W) dot = np.sum(v0 * v1) # If absolute value of dot product is almost 1, vectors are ~colinear, so use lerp if np.abs(dot) > DOT_THRESHOLD: res = lerp(t, v0_copy, v1_copy) return maybe_torch(res, is_torch) ``` Prepare YAML file for merging config: ```yaml name: mistral-omni merge_method: slerp models: - model: ../models/Voxtral-Small-24B-2507 - model: ../models/Mistral-Small-3.2-24B-Instruct-2506 base_model: ../models/Mistral-Small-3.2-24B-Instruct-2506 parameters: t: - filter: self_attn value: [0.1, 0.3, 0.5, 0.3, 0.1, 0] - filter: mlp value: [0.1, 0.3, 0.5, 0.3, 0.1, 0] - value: 0.5 # fallback for rest of tensors dtype: bfloat16 ``` Merge it: ```sh mergekit-yaml mistral_o.yaml ../models/mistral_o ``` Go to the `mistral_o` output directory, then download `tekken.json` from Voxtral and place it there: https://huggingface.co/mistralai/Voxtral-Small-24B-2507/blob/main/tekken.json Finally, use `convert_hf_to_gguf.py` to convert it back to GGUF as usual ### Merge mmproj models Download these mmproj files: - Audio: https://huggingface.co/ggml-org/Voxtral-Mini-3B-2507-GGUF/blob/main/mmproj-Voxtral-Mini-3B-2507-Q8_0.gguf - Vision: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF/blob/main/mmproj-F16.gguf Rename them to `audio.gguf`and `vision.gguf` respectively Then run [merge_mmproj_models.py](https://huggingface.co/ngxson/Home-Cook-Mistral-Small-Omni-24B-2507-GGUF/blob/main/merge_mmproj_models.py) from this repo. The output file will be `mmproj-model.gguf`