|
--- |
|
base_model: |
|
- saishshinde15/Clyrai_Base_Reasoning |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- qwen2 |
|
- trl |
|
- reasoning |
|
- deepseekR1 |
|
- advanced-finetuning |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Clyrai Vortex Reasoning |
|
|
|
- **Developed by:** clyrai |
|
- **License:** apache-2.0 |
|
- **Fine-tuned from:** [saishshinde15/Clyrai_Base_Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning) |
|
- **Category:** Experimental, Research |
|
|
|
## **Introduction** |
|
|
|
TethysAI Vortex Reasoning is an **experimental model** that advances the structured reasoning capabilities pioneered by [Clyrai_Base Reasoning](https://huggingface.co/saishshinde15/TethysAI_Base_Reasoning). While the Base Reasoning model utilized **Generalized Reinforced Policy Optimization (GRPO)** to enhance step-by-step logical thought processes similar to **DeepSeek-R1**, this model takes a different approach—**eliminating GRPO and instead relying on high-end Supervised Fine-Tuning (SFT) techniques**. |
|
|
|
The core objective was to investigate whether **deep reasoning and self-questioning behavior could emerge purely through SFT on high-quality datasets**. The results were highly promising: the model successfully **questions itself internally**, improves reasoning depth, and consistently generates structured, logical responses. |
|
|
|
--- |
|
|
|
## **Key Features** |
|
|
|
### **1️⃣ Advanced Reasoning Without GRPO** |
|
This model **does not rely on GRPO** yet **achieves similar self-reflective thought processes**, proving that structured reasoning can be induced through **high-quality SFT alone**. |
|
|
|
### **2️⃣ Self-Questioning and Iterative Thinking** |
|
The model **actively asks itself intermediate questions before answering**, mimicking the deep **reflection-based thought process** of models like DeepSeek-R1. This leads to **more reliable** and **well-structured** responses. |
|
|
|
### **3️⃣ High-Quality SFT on a Curated Dataset** |
|
To compensate for the lack of reinforcement learning, we used an **extensive dataset** tailored for deep reasoning. This dataset includes: |
|
- **Mathematical proofs & logical puzzles** |
|
- **Complex multi-step problem-solving tasks** |
|
- **Philosophical and ethical reasoning** |
|
- **Scientific hypothesis evaluation** |
|
|
|
### **4️⃣ Implicit Use of `<think>` and `<answer>` Tokens** |
|
The model internally uses **special reasoning markers** (`<think>` and `<answer>`) to structure its responses, though these may not always be visible in the final output. This ensures a **consistent and methodical approach** to answering questions. |
|
|
|
### **5️⃣ Part of the TethysAI Vortex Family** |
|
This model belongs to the **Clyrai Vortex series**, a collection of fine-tuned models pushing the boundaries of **SFT-based reasoning without reinforcement learning**. |
|
|
|
--- |
|
|
|
## **Breakthrough Insights** |
|
|
|
| Feature | Base Reasoning (GRPO) ✅ | Vortex Reasoning (SFT-Only) ✅ | |
|
|----------------------------------|------------------------|----------------------------| |
|
| Structured Thought Process | ✅ Yes (GRPO) | ✅ Yes (SFT) | |
|
| Self-Reflection & Questioning | ✅ Strong | ✅ Equally Strong | |
|
| GRPO-Free Optimization | ❌ No | ✅ Achieved via SFT | |
|
| Step-by-Step Problem Solving | ✅ Yes | ✅ Yes | |
|
| Use of `<think>` and `<answer>` | ✅ Explicit | ✅ Implicit (Internal Use) | |
|
|
|
**Key Takeaway:** This experiment confirms that **reinforcement learning is not the only pathway to advanced reasoning capabilities**—with the right dataset and SFT strategies, models can **self-reflect and logically deduce answers** in a structured manner. |
|
|
|
--- |
|
|
|
## **How to Use** |
|
|
|
### **Running with Transformers** |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
# Load model & tokenizer |
|
model_name = "saishshinde15/Clyrai_Vortex_Reasoning" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda") |
|
|
|
# Prepare input prompt |
|
messages = [ |
|
{"role": "system", "content": "You are an advanced AI assistant. Provide answers in a clear, step-by-step manner."}, |
|
{"role": "user", "content": "If x + 3 = 10, what is x?"} |
|
] |
|
|
|
# Apply chat template and tokenize |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda") |
|
|
|
# Generate response |
|
outputs = model.generate(input_ids, max_new_tokens=512) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print(response) |
|
``` |