Triangle104 commited on
Commit
17e63d7
·
verified ·
1 Parent(s): bd34cf4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +400 -0
README.md CHANGED
@@ -21,6 +21,406 @@ tags:
21
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
22
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) for more details on the model.
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ## Use with llama.cpp
25
  Install llama.cpp through brew (works on Mac and Linux)
26
 
 
21
  This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
22
  Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) for more details on the model.
23
 
24
+
25
+
26
+
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+ ---
35
+
36
+
37
+ Model details:
38
+
39
+
40
+ -
41
+
42
+
43
+
44
+
45
+
46
+
47
+
48
+
49
+ A RP/storywriting
50
+ specialist model, full-parameter finetune of Qwen2.5-7B on mixture of
51
+ synthetic and natural data.
52
+
53
+
54
+
55
+
56
+
57
+
58
+
59
+
60
+ It uses Celeste 70B
61
+ 0.1 data mixture, greatly expanding it to improve
62
+
63
+
64
+
65
+ versatility,
66
+ creativity and "flavor" of the resulting model.
67
+
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+
79
+
80
+
81
+
82
+
83
+
84
+
85
+
86
+
87
+
88
+
89
+
90
+
91
+
92
+
93
+
94
+
95
+
96
+
97
+
98
+
99
+
100
+
101
+
102
+
103
+
104
+
105
+ Version 0.1 notes:
106
+
107
+
108
+ Dataset was deduped
109
+ and cleaned from
110
+
111
+
112
+
113
+ version 0.0, and
114
+ learning rate was adjusted. Resulting model seems to be
115
+
116
+
117
+ stabler, and 0.0
118
+ problems with handling short inputs and min_p sampling
119
+
120
+
121
+ seem to be mostly
122
+ gone.
123
+
124
+
125
+
126
+
127
+
128
+
129
+
130
+
131
+ Will be retrained
132
+ once more, because this run crashed around e1.2 (out
133
+
134
+
135
+ of 3) (thanks,
136
+ DeepSpeed, really appreciate it), and it's still
137
+
138
+
139
+
140
+ somewhat
141
+ undertrained as a result.
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+
153
+
154
+
155
+
156
+
157
+
158
+
159
+
160
+
161
+
162
+
163
+
164
+
165
+
166
+
167
+
168
+
169
+
170
+
171
+
172
+
173
+
174
+
175
+
176
+
177
+
178
+
179
+
180
+ Prompt format is
181
+ ChatML.
182
+
183
+
184
+
185
+
186
+
187
+
188
+
189
+
190
+
191
+
192
+
193
+
194
+
195
+
196
+ Recommended sampler
197
+ values:
198
+
199
+
200
+
201
+
202
+
203
+
204
+
205
+
206
+ Temperature: 0.87
207
+
208
+
209
+ Top-P: 0.81
210
+
211
+
212
+ Repetition Penalty:
213
+ 1.03
214
+
215
+
216
+
217
+
218
+
219
+
220
+
221
+
222
+ Model appears to
223
+ prefer lower temperatures (at least 0.9 and lower). Min-P seems to
224
+ work now, as well.
225
+
226
+
227
+
228
+
229
+
230
+
231
+
232
+
233
+ Recommended
234
+ SillyTavern presets (via CalamitousFelicitousness):
235
+
236
+
237
+
238
+
239
+
240
+
241
+
242
+
243
+
244
+
245
+
246
+
247
+
248
+
249
+
250
+
251
+
252
+
253
+
254
+
255
+ Context
256
+
257
+
258
+ Instruct and System
259
+ Prompt
260
+
261
+
262
+
263
+
264
+
265
+
266
+
267
+
268
+
269
+
270
+
271
+
272
+
273
+
274
+
275
+
276
+
277
+
278
+
279
+
280
+
281
+
282
+
283
+
284
+
285
+
286
+
287
+
288
+
289
+
290
+
291
+
292
+
293
+
294
+
295
+
296
+
297
+
298
+ Training data:
299
+
300
+
301
+
302
+
303
+
304
+
305
+
306
+
307
+ Celeste 70B 0.1 data
308
+ mixture minus Opus Instruct subset. See that model's card for
309
+ details.
310
+
311
+
312
+ Kalomaze's
313
+ Opus_Instruct_25k dataset, filtered for refusals.
314
+
315
+
316
+ A subset (1k rows)
317
+ of ChatGPT-4o-WritingPrompts by Gryphe
318
+
319
+
320
+ A subset (2k rows)
321
+ of Sonnet3.5-Charcards-Roleplay by Gryphe
322
+
323
+
324
+ A cleaned subset
325
+ (~3k rows) of shortstories_synthlabels by Auri
326
+
327
+
328
+ Synthstruct and
329
+ SynthRP datasets by Epiculous
330
+
331
+
332
+
333
+
334
+
335
+
336
+
337
+
338
+
339
+
340
+
341
+
342
+
343
+
344
+ Training time and
345
+ hardware:
346
+
347
+
348
+
349
+
350
+
351
+
352
+
353
+
354
+ 2 days on 4x3090Ti
355
+ (locally)
356
+
357
+
358
+
359
+
360
+
361
+
362
+
363
+
364
+
365
+
366
+
367
+
368
+
369
+
370
+
371
+
372
+
373
+
374
+
375
+
376
+
377
+
378
+
379
+
380
+
381
+
382
+
383
+
384
+
385
+
386
+
387
+
388
+ Model was trained by
389
+ Kearm and Auri.
390
+
391
+
392
+ Special thanks:
393
+
394
+
395
+ to Gryphe, Lemmy,
396
+ Kalomaze, Nopm and Epiculous for the data
397
+
398
+
399
+ to Alpindale for
400
+ helping with FFT config for Qwen2.5
401
+
402
+
403
+ and to
404
+ InfermaticAI's community for their continued support for our
405
+ endeavors
406
+
407
+
408
+
409
+
410
+
411
+
412
+
413
+
414
+ ---
415
+
416
+
417
+
418
+
419
+
420
+
421
+
422
+
423
+ p { line-height: 115%; margin-bottom: 0.25cm; background: transparent }
424
  ## Use with llama.cpp
425
  Install llama.cpp through brew (works on Mac and Linux)
426