the-acorn-ai
/

spiral-octothinker-3b-multi-step00192

Text Generation

reinforcement-learning

text-generation-inference

Model card Files Files and versions

spiral-octothinker-3b-multi-step00192 / README.md

simonycl's picture

Upload SPIRAL step_00192

37cc8ea verified 18 days ago

|

history blame contribute delete

1.65 kB

	---
	base_model: OctoAI/OctoThinker-3B
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- spiral
	- self-play
	- reinforcement-learning
	- octothinker
	- multi-agent
	---

	# SPIRAL OctoThinker-3B Multi-Agent Model

	This model was trained using the SPIRAL (Self-Play Iterative Reinforcement learning for Adaptation and Learning) framework.

	## Model Details

	- Base Model: OctoAI/OctoThinker-3B
	- Training Framework: SPIRAL
	- Checkpoint: step_00192
	- Model Size: 3B parameters
	- Training Date: 2025-08-26

	## Training Configuration

	The model was trained with self-play on multiple environments:
	- KuhnPoker-v1
	- TicTacToe-v0
	- SimpleNegotiation-v1

	### Training Parameters
	```json
	{
	"learning_rate": "1e-6",
	"train_batch_size": 128,
	"num_ppo_epochs": 2,
	"temperature": 1.0,
	"max_model_len": 16384,
	"environments": [
	"KuhnPoker-v1",
	"TicTacToe-v0",
	"SimpleNegotiation-v1"
	],
	"base_model": "OctoAI/OctoThinker-3B",
	"framework": "SPIRAL"
	}
	```

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	tokenizer = AutoTokenizer.from_pretrained("the-acorn-ai/spiral-octothinker-3b-multi-step00192")
	model = AutoModelForCausalLM.from_pretrained(
	"the-acorn-ai/spiral-octothinker-3b-multi-step00192",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Generate text
	inputs = tokenizer("Your prompt here", return_tensors="pt")
	outputs = model.generate(**inputs, max_length=100)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## License

	This model is licensed under the Apache License 2.0.