alamios commited on
Commit
c365740
·
verified ·
1 Parent(s): d37bfff
.gitattributes CHANGED
@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-f16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q4_K_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f70aaa2b5eb8b126489f16533b1171bf4516ec1f61a6b7be5807b8a7ddc4d2e6
3
- size 397932064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7ee5e0496b9f16e791f31529cad8398ebff61f8f034d7b6f634885a72fe0c83
3
+ size 397932352
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cd8384ce8592bda1da205ed5d316f281e2b6aa4c48c8d7f9542dbb0a92f20c3
3
+ size 531192640
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cbfa3658c7bb59872cdbdf9fd8f28814d823615a1d9d1ec0d41cdca5874d33c
3
+ size 994388800
README.md CHANGED
@@ -17,12 +17,14 @@ tags:
17
 
18
  # DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-GGUF
19
 
 
 
20
  This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
21
 
22
  It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
23
 
24
  # Data info
25
 
26
- The data consists of code tasks collected from various datasets. It has been trained for 4 epochs on 1400 unique examples, for a total of 4,600,000 tokens per epoch.
27
 
28
  Since data generation was done using spare GPU time, I may publish a further trained version later.
 
17
 
18
  # DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-GGUF
19
 
20
+ **Updated**
21
+
22
  This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
23
 
24
  It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
25
 
26
  # Data info
27
 
28
+ The data consists of code tasks collected from various datasets. It has been trained for 2 epochs on 2.5k unique examples, for a total of 7.6 million tokens per epoch.
29
 
30
  Since data generation was done using spare GPU time, I may publish a further trained version later.