zeeshaan-ai commited on
Commit
7f40c0b
·
verified ·
1 Parent(s): 239b25a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - nvidia/OpenCodeReasoning-2
4
+ base_model:
5
+ - openai/gpt-oss-20b
6
+ library_name: transformers
7
+ tags:
8
+ - text-generation-inference
9
+ - code
10
+ ---
11
+
12
+ ### Overview
13
+
14
+ - Base model: `openai/gpt-oss-20b`
15
+ - Objective: Supervised fine-tuning for competitive programming and algorithmic reasoning
16
+ - Dataset: `nvidia/OpenCodeReasoning-2` (OCR-2), combining `python` and `cpp` splits. Each sample reconstructs the upstream question and uses the dataset's `r1_generation` as the assistant response
17
+ - Context length: 4096 tokens
18
+ - Training method: LoRA SFT via TRL `SFTTrainer`
19
+
20
+ ### Intended Use
21
+
22
+ - Intended: Generating Python/C++ solutions and reasoning for competitive programming tasks
23
+ - Out of scope: Safety-critical applications. May hallucinate or produce incorrect/inefficient code
24
+
25
+ ### Prompt Format
26
+
27
+ This model was trained in a chat format. Recommended structure:
28
+
29
+ ```python
30
+ messages = [
31
+ {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
32
+ {"role": "user", "content": problem_text},
33
+ ]
34
+
35
+ prompt = tokenizer.apply_chat_template(
36
+ messages,
37
+ tokenize=False,
38
+ add_generation_prompt=True,
39
+ )
40
+ ```
41
+
42
+ If you prefer plain text, place the problem text after a brief instruction, but chat format generally yields better results.
43
+
44
+ ### Reasoning Effort
45
+
46
+ Specify reasoning effort in `apply_chat_template` (supported values: "low", "medium" (default), or "high"):
47
+
48
+ ```python
49
+ messages = [
50
+ {"role": "system", "content": "Always respond in riddles"},
51
+ {"role": "user", "content": "Explain why the meaning of life is 42"},
52
+ ]
53
+
54
+ inputs = tokenizer.apply_chat_template(
55
+ messages,
56
+ add_generation_prompt=True,
57
+ return_tensors="pt",
58
+ return_dict=True,
59
+ reasoning_effort="high",
60
+ ).to(model.device)
61
+
62
+ generated = model.generate(**inputs, max_new_tokens=500)
63
+ print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1]:]))
64
+ ```
65
+
66
+ ### Quick Start (Transformers)
67
+
68
+ ```python
69
+ from transformers import AutoTokenizer, AutoModelForCausalLM
70
+ import torch
71
+
72
+ model_id = "GetSoloTech/gpt-oss-code-reasoning-20b"
73
+
74
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
75
+ model = AutoModelForCausalLM.from_pretrained(
76
+ model_id,
77
+ torch_dtype=auto,
78
+ device_map="auto",
79
+ )
80
+
81
+ problem_text = """
82
+ You are given an array of integers ... (your problem here)
83
+ """
84
+
85
+ messages = [
86
+ {"role": "system", "content": "You are an expert competitive programmer. Read the problem and produce a correct, efficient solution. Include reasoning if helpful."},
87
+ {"role": "user", "content": problem_text},
88
+ ]
89
+
90
+ input_text = tokenizer.apply_chat_template(
91
+ messages,
92
+ tokenize=False,
93
+ add_generation_prompt=True,
94
+ reasoning_effort="medium",
95
+ )
96
+
97
+ inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
98
+ outputs = model.generate(
99
+ **inputs,
100
+ max_new_tokens=768,
101
+ temperature=0.3,
102
+ top_p=0.9,
103
+ repetition_penalty=1.1,
104
+ )
105
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
106
+ ```
107
+
108
+ ### Generation Tips
109
+
110
+ - Reasoning style: Lower temperature (0.2–0.5) for clearer step-by-step reasoning
111
+ - Length: Use `max_new_tokens` 512–1024 for full solutions; shorter for hints
112
+ - Stop tokens: If you only want final code, consider post-processing the model output to extract the last code block
113
+
114
+
115
+ ### Dataset Construction Notes
116
+
117
+ - Source: `nvidia/OpenCodeReasoning-2` with `python` and `cpp` splits
118
+ - For each split, the script:
119
+ - Shuffles and selects up to `--take_samples` examples per split
120
+ - Reconstructs the problem statement from upstream benchmarks (TACO, APPS, DeepMind CodeContests, `open-r1/codeforces`)
121
+ - Filters out rows with missing/empty questions or assistant responses
122
+ - Builds chat-style `messages` and a formatted `text` field with the tokenizer's chat template
123
+ - The final training set is the concatenation of both splits, followed by an optional `train_test_split` according to `--eval_ratio`
124
+
125
+
126
+ ### Acknowledgements
127
+
128
+ - Unsloth (`FastLanguageModel`) for efficient 4-bit loading and fast PEFT
129
+ - TRL (`SFTTrainer`) for straightforward supervised fine-tuning
130
+ - NVIDIA OpenCodeReasoning-2 and upstream benchmarks (TACO, APPS, CodeContests, `open-r1/codeforces`)
131
+
132
+ ---