Arc-Teacher-8B
Arc-Teacher-8B is a specialized reasoning model trained using Reinforcement Learning from Teacher (RLT) methodology. This model demonstrates strong performance on mathematical reasoning (46% on AIME 2025) and cross-domain transfer capabilities.
Model Details
- Base Model: Qwen3-8B
- Architecture: 8B parameters
- Training Method: Teacher-Student Reinforcement Learning with Group Relative Policy Optimization (GRPO)
- Primary Capability: Teaching through reasoning traces rather than direct problem solving
Training Process
- SFT Warmup: Initial supervised fine-tuning on 7k samples from bespoke-stratos-17k dataset
- RL Training: Teacher-student framework where the model learns to generate teaching traces
- Cross-Domain Evaluation: Tested on CRM-arena dataset for policy violation detection
Training Pipeline

Performance
AIME 2025 Results
- Accuracy: 46%
- The model generates detailed thinking traces that guide problem-solving
Cross-Domain Transfer (CRM-Arena)
- Overall Accuracy: 69.2% when actual violations exist
- Detection Rate: 17% for correctly identifying non-violations
- Demonstrates strong cross-domain reasoning capabilities
Usage
Basic Generation
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"Arc-Intelligence/arc-teacher-8b",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/arc-teacher-8b")
Generating Teaching Traces
The model is optimized for generating detailed reasoning traces. Use the following format:
def generate_thinking_trace(model, tokenizer, problem):
prompt = f"""<think>
Let me work through this problem step by step.
Problem: {problem}
First, I need to understand what we're looking for...
"""
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=2000,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
Example: Mathematical Reasoning (AIME)
problem = "Find the sum of all integer bases b>9 for which 17_b is a divisor of 97_b."
trace = generate_thinking_trace(model, tokenizer, problem)
print(trace)
Output includes detailed reasoning:
<think>
Let me work through this problem step by step.
First, I need to understand what these numbers represent in decimal.
17_b = 1*b + 7 in decimal
97_b = 9*b + 7 in decimal
So we need (b + 7) to divide (9b + 7)...
[detailed mathematical derivation]
...Therefore, the valid bases are b = 21 and b = 49.
The sum is 21 + 49 = 70.
</think>
Example: Policy Violation Detection (CRM)
crm_task = """
Given Case ID: 500Wt00000DDzSnIAL
Issue: Customer experiencing scalability issues with QuantumPCB Modeler
Description: "We are concerned that the current scalability issues with QuantumPCB Modeler are limiting our enterprise growth."
Did the agent breach any policy? If yes, which knowledge article was violated?
"""
trace = generate_thinking_trace(model, tokenizer, crm_task)
Model Capabilities
- Mathematical Reasoning: Generates step-by-step solutions for complex problems
- Cross-Domain Transfer: Applies reasoning patterns to new domains
- Teaching Through Traces: Provides detailed explanations of thought processes
- Multi-Step Problem Solving: Handles problems requiring extensive reasoning chains
Limitations
- May generate overly detailed traces for simple problems
- Performance varies across different reasoning domains
- Requires significant computational resources for long traces
Citation
@model{arc-teacher-8b,
title={Arc-Teacher-8B: Teaching Through Reasoning with Reinforcement Learning},
author={Arc Intelligence},
year={2024},
publisher={Hugging Face}
}
License
Apache 2.0
Acknowledgments
This model was trained using the bespoke-stratos-17k dataset for initial SFT and evaluated on AIME 2025 and CRM-Arena benchmarks.
- Downloads last month
- 84
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Arc-Intelligence/arc-teacher-8b
Evaluation results
- accuracyself-reported46.000
- accuracy (when violation exists)self-reported69.200