File size: 6,381 Bytes
449b409
 
9d43f54
 
 
367d4cd
9d43f54
 
152a0ae
 
 
9d43f54
d14d85c
9d43f54
 
 
 
a4df28e
 
 
 
 
 
 
 
 
 
 
449b409
 
f2c879f
26ce83b
7be6e97
 
f542bd6
3e868b3
26ce83b
 
808357c
6fb2f7d
fc73f0b
 
 
808357c
8bc6e74
449b409
 
 
 
 
 
 
 
 
 
 
8bc6e74
449b409
9bdc3cf
808357c
 
e99486f
72a8b13
8bff418
713a2e9
26ce83b
60d6c07
 
5028d92
60d6c07
6b3a938
60d6c07
72a8b13
 
 
 
 
 
 
 
 
4188872
72a8b13
4188872
72a8b13
 
8bff418
72a8b13
808357c
72a8b13
713a2e9
 
fc8db83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ff783bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
license: apache-2.0
language:
- en
base_model:
- DavidAU/Qwen2.5-Godzilla-Coder-51B
pipeline_tag: text-generation
tags:
- merge
- programming
- code generation
- code
- qwen2
- codeqwen
- chat
- qwen
- qwen-coder
- programming
- code generation
- code
- codeqwen
- moe
- coding
- coder
- qwen2
- chat
- qwen
- qwen-coder
---

<h2>Qwen2.5-Godzilla-Coder-51B-gguf</h2>

<img src="godzilla-coder.jpg" style="float:right; width:300px; height:500px; padding:10px;">

"It will pound your programming problems into the pavement... perfectly."

Tipping the scales at 101 layers and 1215 tensors... the monster lives.

Two monsters in fact.

Each model generates stronger, more compact code with an enhanced understanding of your instructions and follows what you tell them to the letter.

And then some.

These overpowered CODING ENGINEs are based on two of the best coder AIs:

"Qwen2.5-Coder-32B-Instruct" 

[ https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct ]

and

"OlympicCoder-32B"

[ https://huggingface.co/open-r1/OlympicCoder-32B ]

These two models are stuffed into one MASSSIVE 51B merge that is stronger in performance and understanding than both donor models.

Quants Q2_K, and Q4_K_S - one of each version - are available at the moment. 

These are unaltered quants for primary testing.

CONFIGS:
- #1 -> Qwen2.5-Coder-32B-Instruct primary/start, with OlympicCoder-32B as "finalizer". 
- #2 -> OlympicCoder-32B as primary/start, with Qwen2.5-Coder-32B-Instruct as "finalizer".

NOTES: 
- Each config/version will be very different from each other.
- Tool Calling is supported in both versions.
- Source(s) / full quanting to follow // full repos to follow.
- Model is fully operational at Q2k - both versions - and stronger than the base donor models in terms of raw performance.
- Final model size (including layers/tensors) / config subject to change.

---

Config / Settings

---

Model is set at 32k/32768 context for these GGUFS, full quants/full repos will be 128k/131072.

Requirements [Qwen 2.5 32B Coder default settings]:
- Temp .5 to .7 (or lower)
- topk: 20, topp: .8, minp: .05
- rep pen: 1.1 (can be lower)
- Jinja Template (embedded) or CHATML template.
- A System Prompt is not required. (ran tests with blank system prompt)

Refer to either "Qwen2.5-Coder-32B-Instruct" and/or "OlympicCoder-32B" repos (above) for additional settings, benchmarks and usage.

---

<H2>Help, Adjustments, Samplers, Parameters and More</H2>

---

<B>CHANGE THE NUMBER OF ACTIVE EXPERTS:</B>

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

<B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B>

In "KoboldCpp" or  "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5 

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"


NOTE: For "text-generation-webui" 

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

- Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

- If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

<B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]


---

<h2>Special Thanks:</h2>

---

Special thanks to all the following, and many more...

All the model makers, fine tuners, mergers, and tweakers:
- Provides the raw "DNA" for almost all my models.
- Sources of model(s) can be found on the repo pages, especially the "source" repos with link(s) to the model creator(s).

Huggingface [ https://huggingface.co ] :
- The place to store, merge, and tune models endlessly.
- THE reason we have an open source community.

LlamaCPP [ https://github.com/ggml-org/llama.cpp ] :
- The ability to compress and run models on GPU(s), CPU(s) and almost all devices.
- Imatrix, Quantization, and other tools to tune the quants and the models.
- Llama-Server : A cli based direct interface to run GGUF models.
- The only tool I use to quant models.

Quant-Masters: Team Mradermacher, Bartowski, and many others:
- Quant models day and night for us all to use.
- They are the lifeblood of open source access.

MergeKit [ https://github.com/arcee-ai/mergekit ] :
- The universal online/offline tool to merge models together and forge something new.
- Over 20 methods to almost instantly merge model, pull them apart and put them together again.
- The tool I have used to create over 1500 models.

Lmstudio [ https://lmstudio.ai/ ] :
- The go to tool to test and run models in GGUF format.
- The Tool I use to test/refine and evaluate new models.
- LMStudio forum on discord; endless info and community for open source.

Text Generation Webui // KolboldCPP // SillyTavern:
- Excellent tools to run GGUF models with - [  https://github.com/oobabooga/text-generation-webui ] [ https://github.com/LostRuins/koboldcpp ] .
- Sillytavern [ https://github.com/SillyTavern/SillyTavern ] can be used with LMSTudio [ https://lmstudio.ai/ ] , TextGen [ https://github.com/oobabooga/text-generation-webui ], Kolboldcpp [ https://github.com/LostRuins/koboldcpp ], Llama-Server [part of LLAMAcpp] as a off the scale front end control system and interface to work with models.