|
--- |
|
base_model: OctoAI/OctoThinker-3B |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- spiral |
|
- self-play |
|
- reinforcement-learning |
|
- octothinker |
|
- multi-agent |
|
--- |
|
|
|
# SPIRAL OctoThinker-3B Multi-Agent Model |
|
|
|
This model was trained using the SPIRAL (Self-Play Iterative Reinforcement learning for Adaptation and Learning) framework. |
|
|
|
## Model Details |
|
|
|
- **Base Model**: OctoAI/OctoThinker-3B |
|
- **Training Framework**: SPIRAL |
|
- **Checkpoint**: step_00192 |
|
- **Model Size**: 3B parameters |
|
- **Training Date**: 2025-08-26 |
|
|
|
## Training Configuration |
|
|
|
The model was trained with self-play on multiple environments: |
|
- KuhnPoker-v1 |
|
- TicTacToe-v0 |
|
- SimpleNegotiation-v1 |
|
|
|
### Training Parameters |
|
```json |
|
{ |
|
"learning_rate": "1e-6", |
|
"train_batch_size": 128, |
|
"num_ppo_epochs": 2, |
|
"temperature": 1.0, |
|
"max_model_len": 16384, |
|
"environments": [ |
|
"KuhnPoker-v1", |
|
"TicTacToe-v0", |
|
"SimpleNegotiation-v1" |
|
], |
|
"base_model": "OctoAI/OctoThinker-3B", |
|
"framework": "SPIRAL" |
|
} |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("the-acorn-ai/spiral-octothinker-3b-multi-step00192") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"the-acorn-ai/spiral-octothinker-3b-multi-step00192", |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto" |
|
) |
|
|
|
# Generate text |
|
inputs = tokenizer("Your prompt here", return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=100) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
``` |
|
|
|
## License |
|
|
|
This model is licensed under the Apache License 2.0. |
|
|