---
datasets:
- nvidia/OpenCodeReasoning-2
- GetSoloTech/Code-Reasoning
base_model:
- openai/gpt-oss-20b
library_name: transformers
tags:
- code-reasoning
- vllm
pipeline_tag: text-generation
---

<img src="gpt-oss-reasoning.png" width="700"/>

### Overview

- Base model: `openai/gpt-oss-20b`
- Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
- Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response
- Context length: 4096 tokens
- Training method: LoRA SFT via TRL `SFTTrainer`

### Intended Use

- Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
- Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code

### Prompt Format

This model was trained in a chat format. Recommended structure:

```python
messages = [
    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
    {"role": "user", "content": problem_text},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
```

If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results.

### Reasoning Effort

Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"):

```python
messages = [
    {"role": "system", "content": "Always respond in riddles"},
    {"role": "user", "content": "Explain why the meaning of life is 42"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    reasoning_effort="high",
).to(model.device)

generated = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
```

### Quick Start (Transformers)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "GetSoloTech/gpt-oss-code-reasoning-20b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=auto,
    device_map="auto",
)

problem_text = """
You are given an array of integers ... (your problem here)
"""

messages = [
    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
    {"role": "user", "content": problem_text},
]

input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    reasoning_effort="medium",
)

inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=768,
    temperature=0.3,
    top_p=0.9,
    repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Generation Tips

- Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
- Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints
- Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block


### Dataset Construction Notes

- Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits
- For each split, the script:
  - Shuffles and selects up to `--take_samples` examples per split
  - Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`)
  - Filters out rows with missing/empty questions or assistant responses
  - Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template
- The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio`


### Acknowledgements

- Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT
- TRL (`SFTTrainer`) for straightforward supervised fine-tuning
- NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`)

---