Files changed (1) hide show
  1. README.md +104 -90
README.md CHANGED
@@ -1,91 +1,105 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-1.5B-Instruct
3
- library_name: transformers
4
- model_name: MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b
5
- tags:
6
- - generated_from_trainer
7
- - trl
8
- - grpo
9
- - deepseek
10
- - r1
11
- licence: license
12
- license: apache-2.0
13
- datasets:
14
- - bhaviktheslider/JSON-Unstructured-Structured
15
- ---
16
-
17
- # Model Card for MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b
18
-
19
- This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).
20
- It has been trained using [TRL](https://github.com/huggingface/trl).
21
-
22
-
23
- Datasets:
24
- - MasterControlAIML/JSON-Unstructured-Structured
25
-
26
- ---
27
-
28
- # DeepSeek R1 Strategy Replication on Qwen-2.5-1.5b on 8*H100 GPUS
29
-
30
- *Problem - Unstructured to Structured JSON Creation*
31
-
32
-
33
- *Desired Input - Unstructured Text Paragraphs and Blank Schema Rules*
34
-
35
- *Output - Filled Created JSON from Unstructured Text following Blank Schema Rules*
36
-
37
- *Dataset Link to Understand More - https://huggingface.co/datasets/MasterControlAIML/JSON-Unstructured-Structured*
38
-
39
-
40
- ## Quick start
41
-
42
- ```python
43
- from transformers import pipeline
44
-
45
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
46
- generator = pipeline("text-generation", model="MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b", device="cuda")
47
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
48
- print(output["generated_text"])
49
- ```
50
-
51
- ## Training procedure
52
-
53
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bhavik18385-mastercontrol/grpo_training/runs/uyerl4vn)
54
-
55
-
56
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
57
-
58
- ### Framework versions
59
-
60
- - TRL: 0.14.0
61
- - Transformers: 4.48.1
62
- - Pytorch: 2.5.1+cu121
63
- - Datasets: 3.1.0
64
- - Tokenizers: 0.21.0
65
-
66
- ## Citations
67
-
68
- Cite GRPO as:
69
-
70
- ```bibtex
71
- @article{zhihong2024deepseekmath,
72
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
73
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
74
- year = 2024,
75
- eprint = {arXiv:2402.03300},
76
- }
77
-
78
- ```
79
-
80
- Cite TRL as:
81
-
82
- ```bibtex
83
- @misc{vonwerra2022trl,
84
- title = {{TRL: Transformer Reinforcement Learning}},
85
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
86
- year = 2020,
87
- journal = {GitHub repository},
88
- publisher = {GitHub},
89
- howpublished = {\url{https://github.com/huggingface/trl}}
90
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ```
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
3
+ library_name: transformers
4
+ model_name: MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - grpo
9
+ - deepseek
10
+ - r1
11
+ licence: license
12
+ license: apache-2.0
13
+ datasets:
14
+ - bhaviktheslider/JSON-Unstructured-Structured
15
+ language:
16
+ - zho
17
+ - eng
18
+ - fra
19
+ - spa
20
+ - por
21
+ - deu
22
+ - ita
23
+ - rus
24
+ - jpn
25
+ - kor
26
+ - vie
27
+ - tha
28
+ - ara
29
+ ---
30
+
31
+ # Model Card for MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b
32
+
33
+ This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).
34
+ It has been trained using [TRL](https://github.com/huggingface/trl).
35
+
36
+
37
+ Datasets:
38
+ - MasterControlAIML/JSON-Unstructured-Structured
39
+
40
+ ---
41
+
42
+ # DeepSeek R1 Strategy Replication on Qwen-2.5-1.5b on 8*H100 GPUS
43
+
44
+ *Problem - Unstructured to Structured JSON Creation*
45
+
46
+
47
+ *Desired Input - Unstructured Text Paragraphs and Blank Schema Rules*
48
+
49
+ *Output - Filled Created JSON from Unstructured Text following Blank Schema Rules*
50
+
51
+ *Dataset Link to Understand More - https://huggingface.co/datasets/MasterControlAIML/JSON-Unstructured-Structured*
52
+
53
+
54
+ ## Quick start
55
+
56
+ ```python
57
+ from transformers import pipeline
58
+
59
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
60
+ generator = pipeline("text-generation", model="MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b", device="cuda")
61
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
62
+ print(output["generated_text"])
63
+ ```
64
+
65
+ ## Training procedure
66
+
67
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/bhavik18385-mastercontrol/grpo_training/runs/uyerl4vn)
68
+
69
+
70
+ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
71
+
72
+ ### Framework versions
73
+
74
+ - TRL: 0.14.0
75
+ - Transformers: 4.48.1
76
+ - Pytorch: 2.5.1+cu121
77
+ - Datasets: 3.1.0
78
+ - Tokenizers: 0.21.0
79
+
80
+ ## Citations
81
+
82
+ Cite GRPO as:
83
+
84
+ ```bibtex
85
+ @article{zhihong2024deepseekmath,
86
+ title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
87
+ author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
88
+ year = 2024,
89
+ eprint = {arXiv:2402.03300},
90
+ }
91
+
92
+ ```
93
+
94
+ Cite TRL as:
95
+
96
+ ```bibtex
97
+ @misc{vonwerra2022trl,
98
+ title = {{TRL: Transformer Reinforcement Learning}},
99
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
100
+ year = 2020,
101
+ journal = {GitHub repository},
102
+ publisher = {GitHub},
103
+ howpublished = {\url{https://github.com/huggingface/trl}}
104
+ }
105
  ```