GetSoloTech
/

GPT-OSS-Code-Reasoning-20B

+---
+datasets:
+- nvidia/OpenCodeReasoning-2
+base_model:
+- openai/gpt-oss-20b
+library_name: transformers
+tags:
+- text-generation-inference
+- code
+---
+### Overview
+- Base model: `openai/gpt-oss-20b`
+- Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
+- Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response
+- Context length: 4096 tokens
+- Training method: LoRA SFT via TRL `SFTTrainer`
+### Intended Use
+- Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
+- Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
+### Prompt Format
+This model was trained in a chat format. Recommended structure:
+```python
+messages = [
+    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
+    {"role": "user", "content": problem_text},
+]
+prompt = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+```
+If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results.
+### Reasoning Effort
+Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"):
+```python
+messages = [
+    {"role": "system", "content": "Always respond in riddles"},
+    {"role": "user", "content": "Explain why the meaning of life is 42"},
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    return_tensors="pt",
+    return_dict=True,
+    reasoning_effort="high",
+).to(model.device)
+generated = model.generate(**inputs, max_new_tokens=500)
+print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
+```
+### Quick Start (Transformers)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "GetSoloTech/gpt-oss-code-reasoning-20b"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=auto,
+    device_map="auto",
+)
+problem_text = """
+You are given an array of integers ... (your problem here)
+"""
+messages = [
+    {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
+    {"role": "user", "content": problem_text},
+]
+input_text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    reasoning_effort="medium",
+)
+inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=768,
+    temperature=0.3,
+    top_p=0.9,
+    repetition_penalty=1.1,
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Generation Tips
+- Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
+- Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints
+- Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block
+### Dataset Construction Notes
+- Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits
+- For each split, the script:
+  - Shuffles and selects up to `--take_samples` examples per split
+  - Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`)
+  - Filters out rows with missing/empty questions or assistant responses
+  - Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template
+- The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio`
+### Acknowledgements
+- Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT
+- TRL (`SFTTrainer`) for straightforward supervised fine-tuning
+- NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`)
+---