--- datasets: - nvidia/OpenCodeReasoning-2 - GetSoloTech/Code-Reasoning base_model: - openai/gpt-oss-20b library_name: transformers tags: - code-reasoning - vllm pipeline_tag: text-generation --- ### Overview - Base model: `openai/gpt-oss-20b` - Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning - Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response - Context length: 4096 tokens - Training method: LoRA SFT via TRL `SFTTrainer` ### Intended Use - Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks - Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code ### Prompt Format This model was trained in a chat format. Recommended structure: ```python messages = [ {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."}, {"role": "user", "content": problem_text}, ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) ``` If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results. ### Reasoning Effort Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"): ```python messages = [ {"role": "system", "content": "Always respond in riddles"}, {"role": "user", "content": "Explain why the meaning of life is 42"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt", return_dict=True, reasoning_effort="high", ).to(model.device) generated = model.generate(**inputs, max_new_tokens=500) print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:])) ``` ### Quick Start (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "GetSoloTech/gpt-oss-code-reasoning-20b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=auto, device_map="auto", ) problem_text = """ You are given an array of integers ... (your problem here) """ messages = [ {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."}, {"role": "user", "content": problem_text}, ] input_text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, reasoning_effort="medium", ) inputs = tokenizer([input_text], return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=768, temperature=0.3, top_p=0.9, repetition_penalty=1.1, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Generation Tips - Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning - Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints - Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block ### Dataset Construction Notes - Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits - For each split, the script: - Shuffles and selects up to `--take_samples` examples per split - Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`) - Filters out rows with missing/empty questions or assistant responses - Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template - The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio` ### Acknowledgements - Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT - TRL (`SFTTrainer`) for straightforward supervised fine-tuning - NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`) ---