MetaphoricalCode commited on
Commit
60034e8
·
verified ·
1 Parent(s): 1db5f20

Upload 16 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,437 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - anthracite-org/kalo-opus-instruct-22k-no-refusal
6
+ - Nopm/Opus_WritingStruct
7
+ - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
8
+ - Gryphe/Sonnet3.5-Charcard-Roleplay
9
+ - Gryphe/ChatGPT-4o-Writing-Prompts
10
+ - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
11
+ - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
12
+ - nothingiisreal/Reddit-Dirty-And-WritingPrompts
13
+ - allura-org/Celeste-1.x-data-mixture
14
+ - cognitivecomputations/dolphin-2.9.3
15
+ base_model:
16
+ - EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
17
+ base_model_relation: quantized
18
+ tags:
19
+ - generated_from_trainer
20
+ model-index:
21
+ - name: EVA-Qwen2.5-32B-SFFT-v0.1
22
+ results: []
23
+ ---
24
+ ## Quantized using the default exllamav3 (0.0.2) quantization process.
25
+
26
+ - Original model: https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
27
+ - exllamav3: https://github.com/turboderp-org/exllamav3
28
+ ---
29
+ # EVA Qwen2.5-32B v0.2
30
+
31
+ <p>
32
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.<br>
33
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
34
+ </p>
35
+
36
+ <p>Dedicated to Nev.</p>
37
+
38
+ <p><b>Version notes for 0.2</b>: Basically, reprocessed the whole dataset again, due to a severe mistake in previously used pipeline, which left the data poisoned with a lot of non-unicode characters. Now, no more weird generation artifacts, and more stability. Major kudos to Cahvay for his work on fixing this critical issue.</p>
39
+
40
+ <p>
41
+ <p>Prompt format is ChatML.</p><br>
42
+ <h3>Recommended sampler values:</h3>
43
+ <ul>
44
+ <li>Temperature: 1</li>
45
+ <li>Min-P: 0.05</li>
46
+ <li>Top-A: 0.2</li>
47
+ <li>Repetition Penalty: 1.03</li>
48
+ </ul>
49
+
50
+ <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
51
+
52
+ - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
53
+ - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
54
+ </p>
55
+
56
+ <p>
57
+ <br>
58
+ <h3>
59
+ Training data:
60
+ </h3>
61
+ <ul>
62
+ <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
63
+ <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
64
+ <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
65
+ <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
66
+ <li>Synthstruct and SynthRP datasets by Epiculous</li>
67
+ <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
68
+ </ul>
69
+ <h3>
70
+ Training time and hardware:
71
+ </h3>
72
+ <ul><li>7 hours on 8xH100 SXM, provided by <a href=https://featherless.ai/>FeatherlessAI</a></li></ul><br>
73
+ </p>
74
+ <p>Model was created by Kearm, Auri and Cahvay.</p>
75
+ <h4>Special thanks:</h4><ul>
76
+ <li><b>to Cahvay for his work on investigating and reprocessing the corrupted dataset, removing the single biggest source of data poisoning.</b></li>
77
+ <li><b>to <a href=https://featherless.ai/>FeatherlessAI</a> for generously providing 8xH100 SXM node for training of this model</b></li>
78
+ <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CognitiveComputations for the data</li>
79
+ <li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>
80
+
81
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
82
+ <details><summary>See axolotl config</summary>
83
+
84
+ axolotl version: `0.4.1`
85
+ ```yaml
86
+ base_model: Qwen/Qwen2.5-32B
87
+
88
+ load_in_8bit: false
89
+ load_in_4bit: false
90
+ strict: false
91
+
92
+ plugins:
93
+ - axolotl.integrations.liger.LigerPlugin
94
+ liger_rope: true
95
+ liger_rms_norm: true
96
+ liger_swiglu: true
97
+ liger_fused_linear_cross_entropy: true
98
+
99
+ # plugins:
100
+ # - axolotl.integrations.spectrum.SpectrumPlugin
101
+
102
+ # spectrum_top_fraction: 0.5
103
+ # # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
104
+ # spectrum_model_name: Qwen/Qwen2.5-32B
105
+
106
+ datasets:
107
+ - path: datasets/Celeste_Filtered_utf8fix.jsonl
108
+ type: sharegpt
109
+ - path: datasets/deduped_not_samantha_norefusals.jsonl
110
+ type: sharegpt
111
+ - path: datasets/deduped_SynthRP-Gens_processed_ShareGPT_converted_cleaned.jsonl
112
+ type: sharegpt
113
+ - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
114
+ type: sharegpt
115
+ - path: datasets/Gryphe-4o-WP-filtered-sharegpt_utf8fix.jsonl
116
+ type: sharegpt
117
+ - path: datasets/opus-instruct-22k-no_refusals-filtered_utf8fix.jsonl
118
+ type: sharegpt
119
+ - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt_utf8fix.jsonl
120
+ type: sharegpt
121
+ - path: datasets/SystemChat_subset_filtered_sharegpt_utf8fix.jsonl
122
+ type: sharegpt
123
+
124
+ chat_template: chatml
125
+ shuffle_merged_datasets: true
126
+ val_set_size: 0.001
127
+ output_dir: ./EVA-Qwen2.5-32B-SFFT-v0.1
128
+
129
+ sequence_len: 10240
130
+ sample_packing: true
131
+ eval_sample_packing: false
132
+ pad_to_sequence_len: true
133
+
134
+ # adapter: qlora
135
+ # lora_model_dir:
136
+ # lora_r: 64
137
+ # lora_alpha: 128
138
+ # lora_dropout: 0.05
139
+ # lora_target_linear: true
140
+ # peft_use_dora: true
141
+
142
+ unfrozen_parameters:
143
+ - ^lm_head.weight$
144
+ - ^model.embed_tokens.weight$
145
+ # mlp.down_proj layers
146
+ - model.layers.63.mlp.down_proj
147
+ - model.layers.49.mlp.down_proj
148
+ - model.layers.48.mlp.down_proj
149
+ - model.layers.45.mlp.down_proj
150
+ - model.layers.44.mlp.down_proj
151
+ - model.layers.47.mlp.down_proj
152
+ - model.layers.46.mlp.down_proj
153
+ - model.layers.43.mlp.down_proj
154
+ - model.layers.8.mlp.down_proj
155
+ - model.layers.11.mlp.down_proj
156
+ - model.layers.19.mlp.down_proj
157
+ - model.layers.35.mlp.down_proj
158
+ - model.layers.20.mlp.down_proj
159
+ - model.layers.52.mlp.down_proj
160
+ - model.layers.39.mlp.down_proj
161
+ - model.layers.62.mlp.down_proj
162
+ - model.layers.50.mlp.down_proj
163
+ - model.layers.29.mlp.down_proj
164
+ - model.layers.16.mlp.down_proj
165
+ - model.layers.28.mlp.down_proj
166
+ - model.layers.53.mlp.down_proj
167
+ - model.layers.30.mlp.down_proj
168
+ - model.layers.31.mlp.down_proj
169
+ - model.layers.32.mlp.down_proj
170
+ - model.layers.7.mlp.down_proj
171
+ - model.layers.36.mlp.down_proj
172
+ - model.layers.12.mlp.down_proj
173
+ - model.layers.18.mlp.down_proj
174
+ - model.layers.37.mlp.down_proj
175
+ - model.layers.38.mlp.down_proj
176
+ - model.layers.14.mlp.down_proj
177
+ - model.layers.13.mlp.down_proj
178
+ # mlp.gate_proj layers
179
+ - model.layers.43.mlp.gate_proj
180
+ - model.layers.61.mlp.gate_proj
181
+ - model.layers.60.mlp.gate_proj
182
+ - model.layers.44.mlp.gate_proj
183
+ - model.layers.62.mlp.gate_proj
184
+ - model.layers.28.mlp.gate_proj
185
+ - model.layers.29.mlp.gate_proj
186
+ - model.layers.45.mlp.gate_proj
187
+ - model.layers.37.mlp.gate_proj
188
+ - model.layers.35.mlp.gate_proj
189
+ - model.layers.59.mlp.gate_proj
190
+ - model.layers.36.mlp.gate_proj
191
+ - model.layers.30.mlp.gate_proj
192
+ - model.layers.48.mlp.gate_proj
193
+ - model.layers.38.mlp.gate_proj
194
+ - model.layers.27.mlp.gate_proj
195
+ - model.layers.31.mlp.gate_proj
196
+ - model.layers.34.mlp.gate_proj
197
+ - model.layers.58.mlp.gate_proj
198
+ - model.layers.33.mlp.gate_proj
199
+ - model.layers.39.mlp.gate_proj
200
+ - model.layers.26.mlp.gate_proj
201
+ - model.layers.32.mlp.gate_proj
202
+ - model.layers.46.mlp.gate_proj
203
+ - model.layers.42.mlp.gate_proj
204
+ - model.layers.49.mlp.gate_proj
205
+ - model.layers.57.mlp.gate_proj
206
+ - model.layers.50.mlp.gate_proj
207
+ - model.layers.47.mlp.gate_proj
208
+ - model.layers.56.mlp.gate_proj
209
+ - model.layers.63.mlp.gate_proj
210
+ - model.layers.55.mlp.gate_proj
211
+ # mlp.up_proj layers
212
+ - model.layers.61.mlp.up_proj
213
+ - model.layers.60.mlp.up_proj
214
+ - model.layers.32.mlp.up_proj
215
+ - model.layers.59.mlp.up_proj
216
+ - model.layers.58.mlp.up_proj
217
+ - model.layers.57.mlp.up_proj
218
+ - model.layers.44.mlp.up_proj
219
+ - model.layers.28.mlp.up_proj
220
+ - model.layers.35.mlp.up_proj
221
+ - model.layers.36.mlp.up_proj
222
+ - model.layers.29.mlp.up_proj
223
+ - model.layers.31.mlp.up_proj
224
+ - model.layers.34.mlp.up_proj
225
+ - model.layers.55.mlp.up_proj
226
+ - model.layers.49.mlp.up_proj
227
+ - model.layers.30.mlp.up_proj
228
+ - model.layers.53.mlp.up_proj
229
+ - model.layers.43.mlp.up_proj
230
+ - model.layers.56.mlp.up_proj
231
+ - model.layers.33.mlp.up_proj
232
+ - model.layers.54.mlp.up_proj
233
+ - model.layers.62.mlp.up_proj
234
+ - model.layers.27.mlp.up_proj
235
+ - model.layers.51.mlp.up_proj
236
+ - model.layers.52.mlp.up_proj
237
+ - model.layers.37.mlp.up_proj
238
+ - model.layers.45.mlp.up_proj
239
+ - model.layers.26.mlp.up_proj
240
+ - model.layers.42.mlp.up_proj
241
+ - model.layers.50.mlp.up_proj
242
+ - model.layers.48.mlp.up_proj
243
+ - model.layers.39.mlp.up_proj
244
+ # self_attn.k_proj layers
245
+ - model.layers.63.self_attn.k_proj
246
+ - model.layers.55.self_attn.k_proj
247
+ - model.layers.60.self_attn.k_proj
248
+ - model.layers.7.self_attn.k_proj
249
+ - model.layers.12.self_attn.k_proj
250
+ - model.layers.13.self_attn.k_proj
251
+ - model.layers.57.self_attn.k_proj
252
+ - model.layers.29.self_attn.k_proj
253
+ - model.layers.14.self_attn.k_proj
254
+ - model.layers.51.self_attn.k_proj
255
+ - model.layers.53.self_attn.k_proj
256
+ - model.layers.54.self_attn.k_proj
257
+ - model.layers.22.self_attn.k_proj
258
+ - model.layers.61.self_attn.k_proj
259
+ - model.layers.18.self_attn.k_proj
260
+ - model.layers.30.self_attn.k_proj
261
+ - model.layers.9.self_attn.k_proj
262
+ - model.layers.24.self_attn.k_proj
263
+ - model.layers.23.self_attn.k_proj
264
+ - model.layers.25.self_attn.k_proj
265
+ - model.layers.10.self_attn.k_proj
266
+ - model.layers.58.self_attn.k_proj
267
+ - model.layers.56.self_attn.k_proj
268
+ - model.layers.15.self_attn.k_proj
269
+ - model.layers.32.self_attn.k_proj
270
+ - model.layers.28.self_attn.k_proj
271
+ - model.layers.8.self_attn.k_proj
272
+ - model.layers.59.self_attn.k_proj
273
+ - model.layers.11.self_attn.k_proj
274
+ - model.layers.48.self_attn.k_proj
275
+ - model.layers.16.self_attn.k_proj
276
+ - model.layers.50.self_attn.k_proj
277
+ # self_attn.o_proj layers
278
+ - model.layers.15.self_attn.o_proj
279
+ - model.layers.23.self_attn.o_proj
280
+ - model.layers.31.self_attn.o_proj
281
+ - model.layers.30.self_attn.o_proj
282
+ - model.layers.18.self_attn.o_proj
283
+ - model.layers.24.self_attn.o_proj
284
+ - model.layers.17.self_attn.o_proj
285
+ - model.layers.28.self_attn.o_proj
286
+ - model.layers.34.self_attn.o_proj
287
+ - model.layers.33.self_attn.o_proj
288
+ - model.layers.25.self_attn.o_proj
289
+ - model.layers.12.self_attn.o_proj
290
+ - model.layers.14.self_attn.o_proj
291
+ - model.layers.29.self_attn.o_proj
292
+ - model.layers.16.self_attn.o_proj
293
+ - model.layers.26.self_attn.o_proj
294
+ - model.layers.22.self_attn.o_proj
295
+ - model.layers.27.self_attn.o_proj
296
+ - model.layers.35.self_attn.o_proj
297
+ - model.layers.20.self_attn.o_proj
298
+ - model.layers.13.self_attn.o_proj
299
+ - model.layers.36.self_attn.o_proj
300
+ - model.layers.19.self_attn.o_proj
301
+ - model.layers.37.self_attn.o_proj
302
+ - model.layers.21.self_attn.o_proj
303
+ - model.layers.11.self_attn.o_proj
304
+ - model.layers.54.self_attn.o_proj
305
+ - model.layers.5.self_attn.o_proj
306
+ - model.layers.38.self_attn.o_proj
307
+ - model.layers.6.self_attn.o_proj
308
+ - model.layers.8.self_attn.o_proj
309
+ - model.layers.9.self_attn.o_proj
310
+ # self_attn.q_proj layers
311
+ - model.layers.1.self_attn.q_proj
312
+ - model.layers.2.self_attn.q_proj
313
+ - model.layers.3.self_attn.q_proj
314
+ - model.layers.45.self_attn.q_proj
315
+ - model.layers.54.self_attn.q_proj
316
+ - model.layers.35.self_attn.q_proj
317
+ - model.layers.48.self_attn.q_proj
318
+ - model.layers.61.self_attn.q_proj
319
+ - model.layers.52.self_attn.q_proj
320
+ - model.layers.50.self_attn.q_proj
321
+ - model.layers.60.self_attn.q_proj
322
+ - model.layers.56.self_attn.q_proj
323
+ - model.layers.58.self_attn.q_proj
324
+ - model.layers.42.self_attn.q_proj
325
+ - model.layers.59.self_attn.q_proj
326
+ - model.layers.44.self_attn.q_proj
327
+ - model.layers.55.self_attn.q_proj
328
+ - model.layers.57.self_attn.q_proj
329
+ - model.layers.41.self_attn.q_proj
330
+ - model.layers.36.self_attn.q_proj
331
+ - model.layers.39.self_attn.q_proj
332
+ - model.layers.4.self_attn.q_proj
333
+ - model.layers.43.self_attn.q_proj
334
+ - model.layers.34.self_attn.q_proj
335
+ - model.layers.46.self_attn.q_proj
336
+ - model.layers.49.self_attn.q_proj
337
+ - model.layers.40.self_attn.q_proj
338
+ - model.layers.25.self_attn.q_proj
339
+ - model.layers.51.self_attn.q_proj
340
+ - model.layers.17.self_attn.q_proj
341
+ - model.layers.37.self_attn.q_proj
342
+ - model.layers.53.self_attn.q_proj
343
+ # self_attn.v_proj layers
344
+ - model.layers.55.self_attn.v_proj
345
+ - model.layers.31.self_attn.v_proj
346
+ - model.layers.47.self_attn.v_proj
347
+ - model.layers.45.self_attn.v_proj
348
+ - model.layers.49.self_attn.v_proj
349
+ - model.layers.48.self_attn.v_proj
350
+ - model.layers.15.self_attn.v_proj
351
+ - model.layers.30.self_attn.v_proj
352
+ - model.layers.7.self_attn.v_proj
353
+ - model.layers.44.self_attn.v_proj
354
+ - model.layers.29.self_attn.v_proj
355
+ - model.layers.51.self_attn.v_proj
356
+ - model.layers.50.self_attn.v_proj
357
+ - model.layers.14.self_attn.v_proj
358
+ - model.layers.54.self_attn.v_proj
359
+ - model.layers.32.self_attn.v_proj
360
+ - model.layers.43.self_attn.v_proj
361
+ - model.layers.10.self_attn.v_proj
362
+ - model.layers.46.self_attn.v_proj
363
+ - model.layers.38.self_attn.v_proj
364
+ - model.layers.57.self_attn.v_proj
365
+ - model.layers.22.self_attn.v_proj
366
+ - model.layers.39.self_attn.v_proj
367
+ - model.layers.6.self_attn.v_proj
368
+ - model.layers.23.self_attn.v_proj
369
+ - model.layers.58.self_attn.v_proj
370
+ - model.layers.53.self_attn.v_proj
371
+ - model.layers.40.self_attn.v_proj
372
+ - model.layers.24.self_attn.v_proj
373
+ - model.layers.9.self_attn.v_proj
374
+ - model.layers.25.self_attn.v_proj
375
+ - model.layers.5.self_attn.v_proj
376
+
377
+
378
+
379
+ wandb_project: EVA-Qwen2.5-32B-SFFT-v0.2
380
+ wandb_entity:
381
+ wandb_watch:
382
+ wandb_name: Unit-02
383
+ wandb_log_model:
384
+
385
+ gradient_accumulation_steps: 8
386
+ micro_batch_size: 1
387
+ num_epochs: 3
388
+ optimizer: paged_adamw_8bit
389
+ lr_scheduler: cosine
390
+ learning_rate: 0.00005
391
+ max_grad_norm: 3
392
+
393
+ train_on_inputs: false
394
+ group_by_length: false
395
+ bf16: auto
396
+ fp16:
397
+ tf32: false
398
+
399
+ gradient_checkpointing: "unsloth"
400
+ # gradient_checkpointing_kwargs:
401
+ # use_reentrant: true
402
+ early_stopping_patience:
403
+ resume_from_checkpoint:
404
+ local_rank:
405
+ logging_steps: 1
406
+ xformers_attention:
407
+ flash_attention: true
408
+
409
+ warmup_steps: 20
410
+ evals_per_epoch: 4
411
+ saves_per_epoch: 4
412
+ save_safetensors: true
413
+ hub_model_id:
414
+ hub_strategy:
415
+ debug:
416
+ deepspeed: deepspeed_configs/zero3_bf16.json
417
+ weight_decay: 0.1
418
+ # fsdp:
419
+ # - full_shard
420
+ # - auto_wrap
421
+ # fsdp_config:
422
+ # fsdp_limit_all_gathers: true
423
+ # fsdp_sync_module_states: false
424
+ # fsdp_offload_params: true
425
+ # fsdp_cpu_ram_efficient_loading: true
426
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
427
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
428
+ # fsdp_activation_checkpointing: true
429
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
430
+ # fsdp_sharding_strategy: FULL_SHARD
431
+ # fsdp_forward_prefetch: false # Added
432
+ # fsdp_backward_prefetch: "BACKWARD_PRE" # Added
433
+ # fsdp_backward_prefetch_limit: 1 # Added
434
+ # fsdp_mixed_precision: BF16 # Added
435
+ ```
436
+
437
+ </details><br>
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Qwen/Qwen2.5-32B",
3
+ "architectures": [
4
+ "Qwen2ForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": 151643,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 5120,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 27648,
12
+ "max_position_embeddings": 131072,
13
+ "max_window_layers": 64,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 40,
16
+ "num_hidden_layers": 64,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_scaling": null,
20
+ "rope_theta": 1000000.0,
21
+ "sliding_window": null,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.45.1",
25
+ "use_cache": false,
26
+ "use_sliding_window": false,
27
+ "vocab_size": 152064,
28
+ "quantization_config": {
29
+ "quant_method": "exl3",
30
+ "version": "0.0.2",
31
+ "bits": 8.0,
32
+ "head_bits": 8,
33
+ "calibration": {
34
+ "rows": 100,
35
+ "cols": 2048
36
+ },
37
+ "out_scales": "auto"
38
+ }
39
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": 151643,
5
+ "max_new_tokens": 2048,
6
+ "transformers_version": "4.45.1"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7174908ec5206a4768e2d13fd9dabbc7b6516f2074460e8149536f8a58547346
3
+ size 8387562984
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae7bb7268a1d731f0805794a5c57f40f800fad11d561d3318d73c4ff07d8e37e
3
+ size 8294088776
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d906abae04af454f0783645517fbffd2342f73eb45662e07085f826e9e2b07ce
3
+ size 8294088776
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb51a2f99932875fc07e6baaf0945c509f3a486248a13a8775eff6bfd0b058d1
3
+ size 8585093880
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
quantization_config.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0382117ea329cdf097041132f6d735924b697924d6f6fc3945713e96ce87539
3
+ size 7031645
tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|endoftext|>",
201
+ "errors": "replace",
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff