Mistral-Small-24B-Instruct-2501-Reasoner-GGUF (Experimental)

This model is a GGUF finetuned version of unsloth/Mistral-Small-24B-Instruct-2501-unsloth-bnb-4bit on the open-thoughts/OpenThoughts-114k dataset, giving the model reasoning capability.

The FP16 version is available at here.

Fine Tuning Details

The base model was finetuned on the open-thoughts/OpenThoughts-114k dataset for 1 epoch, using a single RTX 4090 for approximately 71 hours.

LoRA details:

LoRa Rank: 32
LoRa Alpha: 16 # I know, I forgot to change this number after I changed the rank and only realised it when I'm almost done with the finetune
Quantization: QLoRa
Optim: adamw_8bit
Learning rate: 2e-4
Weight Decay: 0.01
Learning rate scheduler type: linear
Gradient accumulation steps: 8
Per device train batch size: 2

Prompting Format

Recommended system prompt:

Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:

The model's response will be in the format of:

<|begin_of_thought|>
...
<|end_of_thought|>

<|begin_of_solution|>
...
<|end_of_solution|>

You may need to change the default prompt template to Llama 3, as highlighted in here if you're using program like LMStudio.

If you decided not to use the recommended system prompt, you may choose to prefix the model response with <|begin_of_thought|> to force the model into reasoning mode.

Appreciation

Thank you so much to the Unsloth team for their efforts in bringing finetuning to consumer level device. This finetune wouldn't be possible without their contribution.

seacorn
/

Mistral-Small-24B-Instruct-2501-Reasoner-GGUF

Mistral-Small-24B-Instruct-2501-Reasoner-GGUF (Experimental)

Fine Tuning Details

Prompting Format

Appreciation

Model tree for seacorn/Mistral-Small-24B-Instruct-2501-Reasoner-GGUF

Dataset used to train seacorn/Mistral-Small-24B-Instruct-2501-Reasoner-GGUF