Arc-Teacher-8B

Arc-Teacher-8B is a specialized reasoning model trained using Reinforcement Learning from Teacher (RLT) methodology. This model demonstrates strong performance on mathematical reasoning (46% on AIME 2025) and cross-domain transfer capabilities.

Model Details

Base Model: Qwen3-8B
Architecture: 8B parameters
Training Method: Teacher-Student Reinforcement Learning with Group Relative Policy Optimization (GRPO)
Primary Capability: Teaching through reasoning traces rather than direct problem solving

Training Process

SFT Warmup: Initial supervised fine-tuning on 7k samples from bespoke-stratos-17k dataset
RL Training: Teacher-student framework where the model learns to generate teaching traces
Cross-Domain Evaluation: Tested on CRM-arena dataset for policy violation detection

Training Pipeline

Performance

AIME 2025 Results

Accuracy: 46%
The model generates detailed thinking traces that guide problem-solving

Cross-Domain Transfer (CRM-Arena)

Overall Accuracy: 69.2% when actual violations exist
Detection Rate: 17% for correctly identifying non-violations
Demonstrates strong cross-domain reasoning capabilities

Usage

Basic Generation

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Arc-Intelligence/arc-teacher-8b",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/arc-teacher-8b")

Generating Teaching Traces

The model is optimized for generating detailed reasoning traces. Use the following format:

def generate_thinking_trace(model, tokenizer, problem):
    prompt = f"""<think>
Let me work through this problem step by step.

Problem: {problem}

First, I need to understand what we're looking for...
"""
    
    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=2000,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

Example: Mathematical Reasoning (AIME)

problem = "Find the sum of all integer bases b>9 for which 17_b is a divisor of 97_b."

trace = generate_thinking_trace(model, tokenizer, problem)
print(trace)

Output includes detailed reasoning:

<think>
Let me work through this problem step by step.

First, I need to understand what these numbers represent in decimal.
17_b = 1*b + 7 in decimal
97_b = 9*b + 7 in decimal

So we need (b + 7) to divide (9b + 7)...
[detailed mathematical derivation]
...Therefore, the valid bases are b = 21 and b = 49.
The sum is 21 + 49 = 70.
</think>

Example: Policy Violation Detection (CRM)

crm_task = """
Given Case ID: 500Wt00000DDzSnIAL
Issue: Customer experiencing scalability issues with QuantumPCB Modeler
Description: "We are concerned that the current scalability issues with QuantumPCB Modeler are limiting our enterprise growth."

Did the agent breach any policy? If yes, which knowledge article was violated?
"""

trace = generate_thinking_trace(model, tokenizer, crm_task)

Model Capabilities

Mathematical Reasoning: Generates step-by-step solutions for complex problems
Cross-Domain Transfer: Applies reasoning patterns to new domains
Teaching Through Traces: Provides detailed explanations of thought processes
Multi-Step Problem Solving: Handles problems requiring extensive reasoning chains

Limitations

May generate overly detailed traces for simple problems
Performance varies across different reasoning domains
Requires significant computational resources for long traces

Citation

@model{arc-teacher-8b,
  title={Arc-Teacher-8B: Teaching Through Reasoning with Reinforcement Learning},
  author={Arc Intelligence},
  year={2024},
  publisher={Hugging Face}
}

License

Apache 2.0

Acknowledgments

This model was trained using the bespoke-stratos-17k dataset for initial SFT and evaluated on AIME 2025 and CRM-Arena benchmarks.

Arc-Intelligence
/

arc-teacher-8b