---
license: cc-by-nc-2.0
base_model:
- Qwen/Qwen2.5-3B-Instruct
tags:
- Reasoning
- GRPO
- DeepSeek
- CoT
- finetune
---

A fine-tuned variant of **Qwen 2.5 3B Instruct** designed specifically for improved **toggleable reasoning** and **instruction-following capabilities**. This model has been built by engineers at [xioserv.com](https://xioserv.com) and incorporates specialized modifications to enhance performance for structured reasoning tasks.
---
## Overview
The **AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV** model is a refined version of the Qwen 2.5 3B Instruct model. It is optimized to provide responses in a structured format, making it particularly useful for tasks requiring clear separation between reasoning steps and final answers.
### **Toggleable Reasoning Mode**
- If you include the **system prompt**, the model will **explicitly separate reasoning and the final answer**.
- If you **omit the system prompt**, the model will **respond naturally** without structured reasoning.
This makes the model highly **versatile**, allowing users to choose between structured reasoning and direct responses based on their specific use case.
---
## System Prompt
To enable structured reasoning, use the following system prompt:
```
Respond in the following format:
...
...
```
If you do not include this prompt, the model will respond in a **standard, conversational** manner without explicitly separating reasoning from the final answer.
---
## Methodology
To replicate the 'aha moment,' we employed **Group Relative Policy Optimization (GRPO)**, a variant of **Proximal Policy Optimization (PPO)**, which enhances reasoning capabilities while optimizing memory usage. This approach aligns with the techniques outlined in the **DeepSeekMath** paper, where GRPO was instrumental in advancing reasoning in language models. By integrating GRPO with reinforcement learning, our model autonomously refines its problem-solving strategies, mirroring the **self-reflective behavior** observed in **DeepSeek's R1**.
---
## Usage
We have provided **GGUF files** that can be run with **llama.cpp** for efficient inference.
To run the model with **llama.cpp**, follow the instructions in the [llama.cpp repository](https://github.com/ggerganov/llama.cpp).
Ensure that you include the system prompt in your input **if you want structured reasoning output**. Otherwise, the model will function like a standard instruct model.
---
## Acknowledgements
- **xioserv.com** – For the engineering efforts in fine-tuning this model.
- **Hugging Face** – For providing an accessible platform to share and deploy models.
For any questions or contributions, please open an issue or submit a pull request on our [GitHub repository](https://github.com/AaryanK/Qwen_2.5_3B_Instruct_GRPO_Reasoning_XIOSERV).
---
Happy coding!