--- license: llama3.2 datasets: - openai/gsm8k language: - en base_model: - unsloth/Llama-3.2-1B-Instruct library_name: transformers tags: - llama - think --- # MiniThink-1B-base ![image/png](https://cdn-uploads.huggingface.co/production/uploads/646ba0d4c7f672003c851ed2/rsr_FSCzYXN5OTf5UrvCU.png) MiniThink-1B is an experiment to reproduce the "Aha!" moment in AI. Is is trained using a modified version of the method used in the [Unsloth R1 training blog](https://unsloth.ai/blog/r1-reasoning) and the [notebook provided for training LLama 3.1 8B to learn R1 reasoning ](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb). MiniThink is a fine-tuned version of the `unsloth/Llama-3.2-1B-Instruct` model. ## Model Details - **Base Model**: `unsloth/Llama-3.2-1B-Instruct` - **Training**: Fine-tuned using progressive LoRA (ranks: 16 → 32 → 64) with Unsloth's optimization framework - **Task**: Mathematical and logical reasoning with explicit, step-by-step thought processes - **Training Data**: GSM8K dataset enhanced with think-aloud prompting - **Input Format**: Questions requiring detailed, structured reasoning - **Output Format**: A comprehensive thinking process enclosed in `` tags, followed by the final answer ## Dataset used The model was trained on a modified version of Openai's GSM8K dataset, which contains 8K math word problems with one-number answers. To improve the training results, the dataset was slightly modified to exclude comma or period-separated numbers. ## System Prompt The model is trained with the following system prompt to guide its reasoning process: ``` # Define special tokens for thinking process THINK_START = "" THINK_END = "" SYSTEM_PROMPT = f"""Show your reasoning process using tags, then provide your answer. For example: Question: "Janet has 3 apples. She buys 2 more. How many apples does she have?" {THINK_START} Let me solve this step by step: - Janet starts with 3 apples - She buys 2 more apples - I need to add: 3 + 2 = 5 Wait, let me verify: - Initial apples: 3 - Added apples: 2 Yes, the total is 5 apples {THINK_END} 5""" ``` ## Usage The model expects a chat-like input and responds with a structured breakdown of its reasoning. For example: **Input:** Question: “Janet has 3 apples. She buys 2 more. How many apples does she have?” **Output:** ``` Let me solve this step by step: - Janet starts with 3 apples - She buys 2 more apples - I need to add: 3 + 2 = 5 Wait, let me verify: - Initial apples: 3 - Added apples: 2 Yes, the total is 5 apples 5 ``` ## Limitations - Being a 1B-parameter model, its performance is naturally more limited compared to larger models. - Optimized for mathematical and logical tasks; however, complex computations may occasionally yield errors. - Always verify critical outputs. ## Training The model was trained using: - **Progressive LoRA**: Gradually increasing ranks from 16 to 32 and finally 64 - **Mixed Precision Training**: Utilizing bf16 where supported for optimal performance - **GRPO (Guided Reward Policy Optimization)**: Implemented via the Unsloth framework for guided training - **Data**: GSM8K dataset enriched with explicit think-aloud examples ## License This model adheres to the licensing terms of the base Llama-3.2 1B model. Please refer to Meta's Llama-3.2 1B license for details on usage terms and conditions. ## Framework Developed using the [Unsloth Framework](https://github.com/unslothai/unsloth), this model leverages advanced techniques like GRPO and progressive LoRA optimization for efficient training and fine-tuning of large language models.