---
license: apache-2.0
datasets:
- open-r1/OpenR1-Math-220k
- yentinglin/s1K-1.1-trl-format
- simplescaling/s1K-1.1
language:
- en
metrics:
- accuracy
base_model:
- mistralai/Mistral-Small-24B-Instruct-2501
pipeline_tag: text-generation
tags:
- reasoning
model-index:
- name: yentinglin/Mistral-Small-24B-Instruct-2501-reasoning
  results:
  - task:
      type: text-generation
    dataset:
      name: MATH-500
      type: MATH
    metrics:
    - name: pass@1
      type: pass@1
      value: 0.95
      verified: false
    source:
      name: yentinglin/zhtw-reasoning-eval-leaderboard
      url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
  - task:
      type: text-generation
    dataset:
      name: AIME 2025
      type: AIME
    metrics:
    - name: pass@1
      type: pass@1
      value: 0.5333
      verified: false
    source:
      name: yentinglin/zhtw-reasoning-eval-leaderboard
      url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
  - task:
      type: text-generation
    dataset:
      name: AIME 2024
      type: AIME
    metrics:
    - name: pass@1
      type: pass@1
      value: 0.6667
      verified: false
    source:
      name: yentinglin/zhtw-reasoning-eval-leaderboard
      url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
  - task:
      type: text-generation
    dataset:
      name: GPQA Diamond
      type: GPQA
    metrics:
    - name: pass@1
      type: pass@1
      value: 0.62022
      verified: false
    source:
      name: yentinglin/zhtw-reasoning-eval-leaderboard
      url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard
---
# Mistral-Small-Reasoning

<!-- Provide a quick summary of what the model is/does. -->

This model is a fine-tuned version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501), specifically optimized for mathematical reasoning tasks. It has been fine-tuned on datasets including [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), and [s1K-1.1](https://huggingface.co/datasets/simplescaling/s1K-1.1), aiming to enhance its reasoning capabilities.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Developed by:** [Yenting Lin](https://www.linkedin.com/in/yen-ting-lin-416732b3/)
- **Funded by:** [Ubitus](https://ubitus.net)
- **Model type:** Instruction-tuned language model for reasoning
- **Language(s) (NLP):** English (en)
- **License:** Apache 2.0
- **Finetuned from model:** [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)


## How to Get Started with the Model

A demo is available at [twllm.com](https://twllm.com/models/yentinglin/mistral-sft), and inference can be run using vLLM or sglang.


## Training Details

The model was trained using **4×8 H100 GPUs**, provided by [**Ubitus**](https://ubitus.net).  


[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
<details><summary>See Training config</summary>

axolotl version: [`a98526ef7843a3e8aa006f260e6b4fb8912b5f1a`](https://github.com/axolotl-ai-cloud/axolotl/tree/a98526ef7843a3e8aa006f260e6b4fb8912b5f1a)

```yaml
base_model: mistralai/Mistral-Small-24B-Instruct-2501

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

datasets:
  - path: yentinglin/s1K-1.1-trl-format
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: open-r1/OpenR1-Math-220k
    type: chat_template
    chat_template: tokenizer_default
    field_messages: messages
    message_field_role: from
    message_field_content: value
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./placeholder/

sequence_len: 32768
sample_packing: true
eval_sample_packing: False
pad_to_sequence_len: true

wandb_project: Reasoning
wandb_entity:
wandb_watch:
wandb_name: Mistral-24B-SFT-220k
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 5
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
saves_per_epoch: 2
weight_decay: 0.0
deepspeed: deepspeed_configs/zero3_bf16.json
special_tokens:
  pad_token: "<pad>"
```

</details><br>

## Evaluation  

The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025).

Our results below are averaged over multiple runs. See our eval details [here.](https://huggingface.co/datasets/yentinglin/zhtw-reasoning-details-_fsx_ubuntu_yentinglin_ckpt_run_20250214_1600_checkpoint-800_)

| Pass@1                            | # Params | MATH-500 | AIME 2025 | AIME 2024 | GPQA Diamond |
|-----------------------------------|---------|---------|-----------|-----------|--------------|
| **Mistral-24B-Reasoning (Ours)**  | 24B     | 95.0    | 53.33     | 66.67     | 62.02        |
| Mistral-24B-Instruct  | 24B     | 70.6    | -  | -   | 45.3        |
| s1.1-32B                          | 32B     | 93.2    | 40.0      | 56.7      | 61.62        |
| LIMO                          | 32B     | 94.8    | 36.67      | 57.1      | 59.09        |
| DeepSeek-R1-Distill-Llama-70B     | 70B     | 94.5    | 46.67     | 70.0      | 65.2         |
| DeepSeek-R1-Distill-Qwen-32B      | 32B     | 94.3    | 60.0      | 72.6      | 62.1         |
| DeepSeek-R1                       | 671B    | 97.3    | 70.0      | 72.6      | 71.5         |
| o1                                | -       | 96.4    | 79.0      | -         | 75.7         |
| o3-mini (high)                    | -       | 97.9    | 86.5      | -         | 77.2         |
| o3-mini (medium)                  | -       | 97.3    | 76.5      | -         | 74.9         |

## Citation

If you use this model, please cite:
```bib
@article{yentinglin2025_mistral_reasoning,
  author = {Yenting Lin},
  title = {Mistral-Small-24B-Instruct-2501-reasoning},
  journal = {Hugging Face},
  year = {2025},
  url = {https://huggingface.co/yentinglin/Mistral-Small-24B-Instruct-2501-reasoning}
}
```