--- license: apache-2.0 datasets: - open-r1/OpenR1-Math-220k - yentinglin/s1K-1.1-trl-format - simplescaling/s1K-1.1 language: - en metrics: - accuracy base_model: - mistralai/Mistral-Small-24B-Instruct-2501 pipeline_tag: text-generation tags: - reasoning model-index: - name: yentinglin/Mistral-Small-24B-Instruct-2501-reasoning results: - task: type: text-generation dataset: name: MATH-500 type: MATH metrics: - name: pass@1 type: pass@1 value: 0.95 verified: false source: name: yentinglin/zhtw-reasoning-eval-leaderboard url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard - task: type: text-generation dataset: name: AIME 2025 type: AIME metrics: - name: pass@1 type: pass@1 value: 0.5333 verified: false source: name: yentinglin/zhtw-reasoning-eval-leaderboard url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard - task: type: text-generation dataset: name: AIME 2024 type: AIME metrics: - name: pass@1 type: pass@1 value: 0.6667 verified: false source: name: yentinglin/zhtw-reasoning-eval-leaderboard url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard - task: type: text-generation dataset: name: GPQA Diamond type: GPQA metrics: - name: pass@1 type: pass@1 value: 0.62022 verified: false source: name: yentinglin/zhtw-reasoning-eval-leaderboard url: https://huggingface.co/spaces/yentinglin/zhtw-reasoning-eval-leaderboard --- # Mistral-Small-Reasoning This model is a fine-tuned version of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501), specifically optimized for mathematical reasoning tasks. It has been fine-tuned on datasets including [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k), and [s1K-1.1](https://huggingface.co/datasets/simplescaling/s1K-1.1), aiming to enhance its reasoning capabilities. ## Model Details ### Model Description - **Developed by:** [Yenting Lin](https://www.linkedin.com/in/yen-ting-lin-416732b3/) - **Funded by:** [Ubitus](https://ubitus.net) - **Model type:** Instruction-tuned language model for reasoning - **Language(s) (NLP):** English (en) - **License:** Apache 2.0 - **Finetuned from model:** [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) ## How to Get Started with the Model A demo is available at [twllm.com](https://twllm.com/models/yentinglin/mistral-sft), and inference can be run using vLLM or sglang. ## Training Details The model was trained using **4×8 H100 GPUs**, provided by [**Ubitus**](https://ubitus.net). [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See Training config axolotl version: [`a98526ef7843a3e8aa006f260e6b4fb8912b5f1a`](https://github.com/axolotl-ai-cloud/axolotl/tree/a98526ef7843a3e8aa006f260e6b4fb8912b5f1a) ```yaml base_model: mistralai/Mistral-Small-24B-Instruct-2501 plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_swiglu: true liger_fused_linear_cross_entropy: true datasets: - path: yentinglin/s1K-1.1-trl-format type: chat_template chat_template: tokenizer_default field_messages: messages message_field_role: role message_field_content: content - path: open-r1/OpenR1-Math-220k type: chat_template chat_template: tokenizer_default field_messages: messages message_field_role: from message_field_content: value dataset_prepared_path: val_set_size: 0.0 output_dir: ./placeholder/ sequence_len: 32768 sample_packing: true eval_sample_packing: False pad_to_sequence_len: true wandb_project: Reasoning wandb_entity: wandb_watch: wandb_name: Mistral-24B-SFT-220k wandb_log_model: gradient_accumulation_steps: 4 micro_batch_size: 1 num_epochs: 5 optimizer: adamw_torch_fused lr_scheduler: cosine learning_rate: 2e-5 train_on_inputs: false group_by_length: false bf16: auto tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false logging_steps: 1 flash_attention: true warmup_ratio: 0.1 saves_per_epoch: 2 weight_decay: 0.0 deepspeed: deepspeed_configs/zero3_bf16.json special_tokens: pad_token: "" ```

## Evaluation The evaluation code is available at [Hugging Face Open-R1](https://github.com/huggingface/open-r1). Note that I have updated the AIME 25 dataset to the full set, available at [AIME 2025](https://huggingface.co/datasets/yentinglin/aime_2025). Our results below are averaged over multiple runs. See our eval details [here.](https://huggingface.co/datasets/yentinglin/zhtw-reasoning-details-_fsx_ubuntu_yentinglin_ckpt_run_20250214_1600_checkpoint-800_) | Pass@1 | # Params | MATH-500 | AIME 2025 | AIME 2024 | GPQA Diamond | |-----------------------------------|---------|---------|-----------|-----------|--------------| | **Mistral-24B-Reasoning (Ours)** | 24B | 95.0 | 53.33 | 66.67 | 62.02 | | Mistral-24B-Instruct | 24B | 70.6 | - | - | 45.3 | | s1.1-32B | 32B | 93.2 | 40.0 | 56.7 | 61.62 | | LIMO | 32B | 94.8 | 36.67 | 57.1 | 59.09 | | DeepSeek-R1-Distill-Llama-70B | 70B | 94.5 | 46.67 | 70.0 | 65.2 | | DeepSeek-R1-Distill-Qwen-32B | 32B | 94.3 | 60.0 | 72.6 | 62.1 | | DeepSeek-R1 | 671B | 97.3 | 70.0 | 72.6 | 71.5 | | o1 | - | 96.4 | 79.0 | - | 75.7 | | o3-mini (high) | - | 97.9 | 86.5 | - | 77.2 | | o3-mini (medium) | - | 97.3 | 76.5 | - | 74.9 | ## Citation If you use this model, please cite: ```bib @article{yentinglin2025_mistral_reasoning, author = {Yenting Lin}, title = {Mistral-Small-24B-Instruct-2501-reasoning}, journal = {Hugging Face}, year = {2025}, url = {https://huggingface.co/yentinglin/Mistral-Small-24B-Instruct-2501-reasoning} } ```