yuzhounie commited on
Commit
da9401d
·
verified ·
1 Parent(s): fee3357

End of training

Browse files
Files changed (5) hide show
  1. README.md +2 -1
  2. all_results.json +8 -0
  3. train_results.json +8 -0
  4. trainer_state.json +1303 -0
  5. training_loss.png +0 -0
README.md CHANGED
@@ -4,6 +4,7 @@ license: apache-2.0
4
  base_model: Qwen/Qwen2.5-Coder-32B-Instruct
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: SWE-BENCH-5k-first-2000-claude-search-replace-generation_qwen_code_32B_5k_first_2000_generation
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # SWE-BENCH-5k-first-2000-claude-search-replace-generation_qwen_code_32B_5k_first_2000_generation
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) on an unknown dataset.
19
 
20
  ## Model description
21
 
 
4
  base_model: Qwen/Qwen2.5-Coder-32B-Instruct
5
  tags:
6
  - llama-factory
7
+ - full
8
  - generated_from_trainer
9
  model-index:
10
  - name: SWE-BENCH-5k-first-2000-claude-search-replace-generation_qwen_code_32B_5k_first_2000_generation
 
16
 
17
  # SWE-BENCH-5k-first-2000-claude-search-replace-generation_qwen_code_32B_5k_first_2000_generation
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) on the SWE-BENCH-5k-first-2000-claude-search-replace-generation dataset.
20
 
21
  ## Model description
22
 
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.96398891966759,
3
+ "total_flos": 6.743893969836442e+16,
4
+ "train_loss": 0.3866574793226189,
5
+ "train_runtime": 24143.75,
6
+ "train_samples_per_second": 0.179,
7
+ "train_steps_per_second": 0.007
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.96398891966759,
3
+ "total_flos": 6.743893969836442e+16,
4
+ "train_loss": 0.3866574793226189,
5
+ "train_runtime": 24143.75,
6
+ "train_samples_per_second": 0.179,
7
+ "train_steps_per_second": 0.007
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.96398891966759,
6
+ "eval_steps": 500,
7
+ "global_step": 180,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.01662049861495845,
14
+ "grad_norm": 1.9652302265167236,
15
+ "learning_rate": 0.0,
16
+ "loss": 0.7152,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 0.0332409972299169,
21
+ "grad_norm": 2.135629177093506,
22
+ "learning_rate": 5.555555555555555e-07,
23
+ "loss": 0.7024,
24
+ "step": 2
25
+ },
26
+ {
27
+ "epoch": 0.04986149584487535,
28
+ "grad_norm": 2.365844964981079,
29
+ "learning_rate": 1.111111111111111e-06,
30
+ "loss": 0.7755,
31
+ "step": 3
32
+ },
33
+ {
34
+ "epoch": 0.0664819944598338,
35
+ "grad_norm": 1.939900517463684,
36
+ "learning_rate": 1.6666666666666667e-06,
37
+ "loss": 0.7134,
38
+ "step": 4
39
+ },
40
+ {
41
+ "epoch": 0.08310249307479224,
42
+ "grad_norm": 1.8507870435714722,
43
+ "learning_rate": 2.222222222222222e-06,
44
+ "loss": 0.6644,
45
+ "step": 5
46
+ },
47
+ {
48
+ "epoch": 0.0997229916897507,
49
+ "grad_norm": 1.8390847444534302,
50
+ "learning_rate": 2.7777777777777783e-06,
51
+ "loss": 0.7306,
52
+ "step": 6
53
+ },
54
+ {
55
+ "epoch": 0.11634349030470914,
56
+ "grad_norm": 1.2149966955184937,
57
+ "learning_rate": 3.3333333333333333e-06,
58
+ "loss": 0.5377,
59
+ "step": 7
60
+ },
61
+ {
62
+ "epoch": 0.1329639889196676,
63
+ "grad_norm": 1.203329086303711,
64
+ "learning_rate": 3.88888888888889e-06,
65
+ "loss": 0.6448,
66
+ "step": 8
67
+ },
68
+ {
69
+ "epoch": 0.14958448753462603,
70
+ "grad_norm": 1.1259090900421143,
71
+ "learning_rate": 4.444444444444444e-06,
72
+ "loss": 0.6041,
73
+ "step": 9
74
+ },
75
+ {
76
+ "epoch": 0.16620498614958448,
77
+ "grad_norm": 0.9785488247871399,
78
+ "learning_rate": 5e-06,
79
+ "loss": 0.6802,
80
+ "step": 10
81
+ },
82
+ {
83
+ "epoch": 0.18282548476454294,
84
+ "grad_norm": 0.7702904343605042,
85
+ "learning_rate": 5.555555555555557e-06,
86
+ "loss": 0.5737,
87
+ "step": 11
88
+ },
89
+ {
90
+ "epoch": 0.1994459833795014,
91
+ "grad_norm": 0.7972448468208313,
92
+ "learning_rate": 6.111111111111112e-06,
93
+ "loss": 0.6071,
94
+ "step": 12
95
+ },
96
+ {
97
+ "epoch": 0.21606648199445982,
98
+ "grad_norm": 0.8643639087677002,
99
+ "learning_rate": 6.666666666666667e-06,
100
+ "loss": 0.5645,
101
+ "step": 13
102
+ },
103
+ {
104
+ "epoch": 0.23268698060941828,
105
+ "grad_norm": 0.822340190410614,
106
+ "learning_rate": 7.222222222222223e-06,
107
+ "loss": 0.5512,
108
+ "step": 14
109
+ },
110
+ {
111
+ "epoch": 0.24930747922437674,
112
+ "grad_norm": 1.0604660511016846,
113
+ "learning_rate": 7.77777777777778e-06,
114
+ "loss": 0.5875,
115
+ "step": 15
116
+ },
117
+ {
118
+ "epoch": 0.2659279778393352,
119
+ "grad_norm": 0.8126739263534546,
120
+ "learning_rate": 8.333333333333334e-06,
121
+ "loss": 0.5601,
122
+ "step": 16
123
+ },
124
+ {
125
+ "epoch": 0.28254847645429365,
126
+ "grad_norm": 0.7240079641342163,
127
+ "learning_rate": 8.888888888888888e-06,
128
+ "loss": 0.5724,
129
+ "step": 17
130
+ },
131
+ {
132
+ "epoch": 0.29916897506925205,
133
+ "grad_norm": 0.6566236615180969,
134
+ "learning_rate": 9.444444444444445e-06,
135
+ "loss": 0.5535,
136
+ "step": 18
137
+ },
138
+ {
139
+ "epoch": 0.3157894736842105,
140
+ "grad_norm": 0.7229272723197937,
141
+ "learning_rate": 1e-05,
142
+ "loss": 0.5413,
143
+ "step": 19
144
+ },
145
+ {
146
+ "epoch": 0.33240997229916897,
147
+ "grad_norm": 0.6160261034965515,
148
+ "learning_rate": 9.999059852242508e-06,
149
+ "loss": 0.4809,
150
+ "step": 20
151
+ },
152
+ {
153
+ "epoch": 0.3490304709141274,
154
+ "grad_norm": 0.5426657199859619,
155
+ "learning_rate": 9.996239762521152e-06,
156
+ "loss": 0.4453,
157
+ "step": 21
158
+ },
159
+ {
160
+ "epoch": 0.3656509695290859,
161
+ "grad_norm": 0.6986624002456665,
162
+ "learning_rate": 9.991540791356342e-06,
163
+ "loss": 0.5704,
164
+ "step": 22
165
+ },
166
+ {
167
+ "epoch": 0.38227146814404434,
168
+ "grad_norm": 0.6466948986053467,
169
+ "learning_rate": 9.98496470583896e-06,
170
+ "loss": 0.5222,
171
+ "step": 23
172
+ },
173
+ {
174
+ "epoch": 0.3988919667590028,
175
+ "grad_norm": 0.5881003141403198,
176
+ "learning_rate": 9.976513978965829e-06,
177
+ "loss": 0.4903,
178
+ "step": 24
179
+ },
180
+ {
181
+ "epoch": 0.4155124653739612,
182
+ "grad_norm": 0.5835773348808289,
183
+ "learning_rate": 9.966191788709716e-06,
184
+ "loss": 0.4936,
185
+ "step": 25
186
+ },
187
+ {
188
+ "epoch": 0.43213296398891965,
189
+ "grad_norm": 0.5974717736244202,
190
+ "learning_rate": 9.954002016824226e-06,
191
+ "loss": 0.544,
192
+ "step": 26
193
+ },
194
+ {
195
+ "epoch": 0.4487534626038781,
196
+ "grad_norm": 0.6126233339309692,
197
+ "learning_rate": 9.939949247384046e-06,
198
+ "loss": 0.5313,
199
+ "step": 27
200
+ },
201
+ {
202
+ "epoch": 0.46537396121883656,
203
+ "grad_norm": 0.5605891942977905,
204
+ "learning_rate": 9.924038765061042e-06,
205
+ "loss": 0.5121,
206
+ "step": 28
207
+ },
208
+ {
209
+ "epoch": 0.481994459833795,
210
+ "grad_norm": 0.523395299911499,
211
+ "learning_rate": 9.906276553136924e-06,
212
+ "loss": 0.4705,
213
+ "step": 29
214
+ },
215
+ {
216
+ "epoch": 0.4986149584487535,
217
+ "grad_norm": 0.5597982406616211,
218
+ "learning_rate": 9.886669291253178e-06,
219
+ "loss": 0.4951,
220
+ "step": 30
221
+ },
222
+ {
223
+ "epoch": 0.5152354570637119,
224
+ "grad_norm": 0.5273374915122986,
225
+ "learning_rate": 9.86522435289912e-06,
226
+ "loss": 0.4763,
227
+ "step": 31
228
+ },
229
+ {
230
+ "epoch": 0.5318559556786704,
231
+ "grad_norm": 0.5255304574966431,
232
+ "learning_rate": 9.841949802639031e-06,
233
+ "loss": 0.5133,
234
+ "step": 32
235
+ },
236
+ {
237
+ "epoch": 0.5484764542936288,
238
+ "grad_norm": 0.8223831057548523,
239
+ "learning_rate": 9.816854393079402e-06,
240
+ "loss": 0.4865,
241
+ "step": 33
242
+ },
243
+ {
244
+ "epoch": 0.5650969529085873,
245
+ "grad_norm": 0.4619203805923462,
246
+ "learning_rate": 9.789947561577445e-06,
247
+ "loss": 0.4631,
248
+ "step": 34
249
+ },
250
+ {
251
+ "epoch": 0.5817174515235457,
252
+ "grad_norm": 0.4974648654460907,
253
+ "learning_rate": 9.761239426692077e-06,
254
+ "loss": 0.5039,
255
+ "step": 35
256
+ },
257
+ {
258
+ "epoch": 0.5983379501385041,
259
+ "grad_norm": 0.5178198218345642,
260
+ "learning_rate": 9.730740784378755e-06,
261
+ "loss": 0.4618,
262
+ "step": 36
263
+ },
264
+ {
265
+ "epoch": 0.6149584487534626,
266
+ "grad_norm": 0.5592218637466431,
267
+ "learning_rate": 9.698463103929542e-06,
268
+ "loss": 0.4777,
269
+ "step": 37
270
+ },
271
+ {
272
+ "epoch": 0.631578947368421,
273
+ "grad_norm": 0.4956098198890686,
274
+ "learning_rate": 9.664418523660004e-06,
275
+ "loss": 0.4925,
276
+ "step": 38
277
+ },
278
+ {
279
+ "epoch": 0.6481994459833795,
280
+ "grad_norm": 0.48805150389671326,
281
+ "learning_rate": 9.628619846344453e-06,
282
+ "loss": 0.4423,
283
+ "step": 39
284
+ },
285
+ {
286
+ "epoch": 0.6648199445983379,
287
+ "grad_norm": 0.5749639868736267,
288
+ "learning_rate": 9.591080534401371e-06,
289
+ "loss": 0.55,
290
+ "step": 40
291
+ },
292
+ {
293
+ "epoch": 0.6814404432132964,
294
+ "grad_norm": 0.7393980622291565,
295
+ "learning_rate": 9.551814704830734e-06,
296
+ "loss": 0.426,
297
+ "step": 41
298
+ },
299
+ {
300
+ "epoch": 0.6980609418282548,
301
+ "grad_norm": 0.5011327862739563,
302
+ "learning_rate": 9.51083712390519e-06,
303
+ "loss": 0.4628,
304
+ "step": 42
305
+ },
306
+ {
307
+ "epoch": 0.7146814404432132,
308
+ "grad_norm": 0.572926938533783,
309
+ "learning_rate": 9.468163201617063e-06,
310
+ "loss": 0.527,
311
+ "step": 43
312
+ },
313
+ {
314
+ "epoch": 0.7313019390581718,
315
+ "grad_norm": 0.5243227481842041,
316
+ "learning_rate": 9.423808985883289e-06,
317
+ "loss": 0.5115,
318
+ "step": 44
319
+ },
320
+ {
321
+ "epoch": 0.7479224376731302,
322
+ "grad_norm": 0.5271593928337097,
323
+ "learning_rate": 9.377791156510456e-06,
324
+ "loss": 0.4921,
325
+ "step": 45
326
+ },
327
+ {
328
+ "epoch": 0.7645429362880887,
329
+ "grad_norm": 0.5143831968307495,
330
+ "learning_rate": 9.330127018922195e-06,
331
+ "loss": 0.4842,
332
+ "step": 46
333
+ },
334
+ {
335
+ "epoch": 0.7811634349030471,
336
+ "grad_norm": 0.5135733485221863,
337
+ "learning_rate": 9.280834497651334e-06,
338
+ "loss": 0.4939,
339
+ "step": 47
340
+ },
341
+ {
342
+ "epoch": 0.7977839335180056,
343
+ "grad_norm": 0.5173041820526123,
344
+ "learning_rate": 9.229932129599206e-06,
345
+ "loss": 0.4819,
346
+ "step": 48
347
+ },
348
+ {
349
+ "epoch": 0.814404432132964,
350
+ "grad_norm": 0.570851743221283,
351
+ "learning_rate": 9.177439057064684e-06,
352
+ "loss": 0.5439,
353
+ "step": 49
354
+ },
355
+ {
356
+ "epoch": 0.8310249307479224,
357
+ "grad_norm": 0.552671492099762,
358
+ "learning_rate": 9.123375020545534e-06,
359
+ "loss": 0.4669,
360
+ "step": 50
361
+ },
362
+ {
363
+ "epoch": 0.8476454293628809,
364
+ "grad_norm": 0.5668032765388489,
365
+ "learning_rate": 9.067760351314838e-06,
366
+ "loss": 0.5138,
367
+ "step": 51
368
+ },
369
+ {
370
+ "epoch": 0.8642659279778393,
371
+ "grad_norm": 0.48532989621162415,
372
+ "learning_rate": 9.01061596377522e-06,
373
+ "loss": 0.4827,
374
+ "step": 52
375
+ },
376
+ {
377
+ "epoch": 0.8808864265927978,
378
+ "grad_norm": 0.4953126311302185,
379
+ "learning_rate": 8.951963347593797e-06,
380
+ "loss": 0.4273,
381
+ "step": 53
382
+ },
383
+ {
384
+ "epoch": 0.8975069252077562,
385
+ "grad_norm": 0.5042351484298706,
386
+ "learning_rate": 8.891824559620801e-06,
387
+ "loss": 0.5311,
388
+ "step": 54
389
+ },
390
+ {
391
+ "epoch": 0.9141274238227147,
392
+ "grad_norm": 0.532244086265564,
393
+ "learning_rate": 8.83022221559489e-06,
394
+ "loss": 0.5364,
395
+ "step": 55
396
+ },
397
+ {
398
+ "epoch": 0.9307479224376731,
399
+ "grad_norm": 0.5507211089134216,
400
+ "learning_rate": 8.767179481638303e-06,
401
+ "loss": 0.5264,
402
+ "step": 56
403
+ },
404
+ {
405
+ "epoch": 0.9473684210526315,
406
+ "grad_norm": 0.5117627382278442,
407
+ "learning_rate": 8.702720065545024e-06,
408
+ "loss": 0.4994,
409
+ "step": 57
410
+ },
411
+ {
412
+ "epoch": 0.96398891966759,
413
+ "grad_norm": 0.6424684524536133,
414
+ "learning_rate": 8.636868207865244e-06,
415
+ "loss": 0.5321,
416
+ "step": 58
417
+ },
418
+ {
419
+ "epoch": 0.9806094182825484,
420
+ "grad_norm": 0.5632804036140442,
421
+ "learning_rate": 8.569648672789496e-06,
422
+ "loss": 0.5354,
423
+ "step": 59
424
+ },
425
+ {
426
+ "epoch": 0.997229916897507,
427
+ "grad_norm": 0.5519580841064453,
428
+ "learning_rate": 8.501086738835843e-06,
429
+ "loss": 0.5502,
430
+ "step": 60
431
+ },
432
+ {
433
+ "epoch": 1.0,
434
+ "grad_norm": 0.5519580841064453,
435
+ "learning_rate": 8.43120818934367e-06,
436
+ "loss": 0.4298,
437
+ "step": 61
438
+ },
439
+ {
440
+ "epoch": 1.0166204986149585,
441
+ "grad_norm": 1.4024403095245361,
442
+ "learning_rate": 8.360039302777614e-06,
443
+ "loss": 0.3848,
444
+ "step": 62
445
+ },
446
+ {
447
+ "epoch": 1.0332409972299168,
448
+ "grad_norm": 0.4745033085346222,
449
+ "learning_rate": 8.28760684284532e-06,
450
+ "loss": 0.4,
451
+ "step": 63
452
+ },
453
+ {
454
+ "epoch": 1.0498614958448753,
455
+ "grad_norm": 0.5079669952392578,
456
+ "learning_rate": 8.213938048432697e-06,
457
+ "loss": 0.3824,
458
+ "step": 64
459
+ },
460
+ {
461
+ "epoch": 1.0664819944598338,
462
+ "grad_norm": 0.49697190523147583,
463
+ "learning_rate": 8.139060623360494e-06,
464
+ "loss": 0.4243,
465
+ "step": 65
466
+ },
467
+ {
468
+ "epoch": 1.0831024930747923,
469
+ "grad_norm": 0.4616394639015198,
470
+ "learning_rate": 8.063002725966014e-06,
471
+ "loss": 0.3888,
472
+ "step": 66
473
+ },
474
+ {
475
+ "epoch": 1.0997229916897506,
476
+ "grad_norm": 0.4260391294956207,
477
+ "learning_rate": 7.985792958513932e-06,
478
+ "loss": 0.3406,
479
+ "step": 67
480
+ },
481
+ {
482
+ "epoch": 1.1163434903047091,
483
+ "grad_norm": 0.47153493762016296,
484
+ "learning_rate": 7.907460356440133e-06,
485
+ "loss": 0.3636,
486
+ "step": 68
487
+ },
488
+ {
489
+ "epoch": 1.1329639889196677,
490
+ "grad_norm": 0.5076174139976501,
491
+ "learning_rate": 7.828034377432694e-06,
492
+ "loss": 0.4166,
493
+ "step": 69
494
+ },
495
+ {
496
+ "epoch": 1.149584487534626,
497
+ "grad_norm": 0.5310080647468567,
498
+ "learning_rate": 7.747544890354031e-06,
499
+ "loss": 0.4311,
500
+ "step": 70
501
+ },
502
+ {
503
+ "epoch": 1.1662049861495845,
504
+ "grad_norm": 0.5010002851486206,
505
+ "learning_rate": 7.666022164008458e-06,
506
+ "loss": 0.3193,
507
+ "step": 71
508
+ },
509
+ {
510
+ "epoch": 1.182825484764543,
511
+ "grad_norm": 0.49259936809539795,
512
+ "learning_rate": 7.5834968557593155e-06,
513
+ "loss": 0.3456,
514
+ "step": 72
515
+ },
516
+ {
517
+ "epoch": 1.1994459833795015,
518
+ "grad_norm": 0.5213885307312012,
519
+ "learning_rate": 7.500000000000001e-06,
520
+ "loss": 0.3615,
521
+ "step": 73
522
+ },
523
+ {
524
+ "epoch": 1.2160664819944598,
525
+ "grad_norm": 0.512752115726471,
526
+ "learning_rate": 7.415562996483193e-06,
527
+ "loss": 0.3569,
528
+ "step": 74
529
+ },
530
+ {
531
+ "epoch": 1.2326869806094183,
532
+ "grad_norm": 0.5139035582542419,
533
+ "learning_rate": 7.330217598512696e-06,
534
+ "loss": 0.3859,
535
+ "step": 75
536
+ },
537
+ {
538
+ "epoch": 1.2493074792243768,
539
+ "grad_norm": 0.5561084151268005,
540
+ "learning_rate": 7.243995901002312e-06,
541
+ "loss": 0.363,
542
+ "step": 76
543
+ },
544
+ {
545
+ "epoch": 1.2659279778393353,
546
+ "grad_norm": 0.49844229221343994,
547
+ "learning_rate": 7.156930328406268e-06,
548
+ "loss": 0.3648,
549
+ "step": 77
550
+ },
551
+ {
552
+ "epoch": 1.2825484764542936,
553
+ "grad_norm": 0.5111745595932007,
554
+ "learning_rate": 7.069053622525697e-06,
555
+ "loss": 0.3453,
556
+ "step": 78
557
+ },
558
+ {
559
+ "epoch": 1.299168975069252,
560
+ "grad_norm": 0.5968831777572632,
561
+ "learning_rate": 6.980398830195785e-06,
562
+ "loss": 0.3601,
563
+ "step": 79
564
+ },
565
+ {
566
+ "epoch": 1.3157894736842106,
567
+ "grad_norm": 0.3998188376426697,
568
+ "learning_rate": 6.890999290858213e-06,
569
+ "loss": 0.2965,
570
+ "step": 80
571
+ },
572
+ {
573
+ "epoch": 1.332409972299169,
574
+ "grad_norm": 0.5044348239898682,
575
+ "learning_rate": 6.800888624023552e-06,
576
+ "loss": 0.3579,
577
+ "step": 81
578
+ },
579
+ {
580
+ "epoch": 1.3490304709141274,
581
+ "grad_norm": 0.499636709690094,
582
+ "learning_rate": 6.710100716628345e-06,
583
+ "loss": 0.3751,
584
+ "step": 82
585
+ },
586
+ {
587
+ "epoch": 1.365650969529086,
588
+ "grad_norm": 0.5045871734619141,
589
+ "learning_rate": 6.618669710291607e-06,
590
+ "loss": 0.3782,
591
+ "step": 83
592
+ },
593
+ {
594
+ "epoch": 1.3822714681440442,
595
+ "grad_norm": 0.5296726822853088,
596
+ "learning_rate": 6.526629988475567e-06,
597
+ "loss": 0.413,
598
+ "step": 84
599
+ },
600
+ {
601
+ "epoch": 1.3988919667590027,
602
+ "grad_norm": 0.5541542768478394,
603
+ "learning_rate": 6.434016163555452e-06,
604
+ "loss": 0.4176,
605
+ "step": 85
606
+ },
607
+ {
608
+ "epoch": 1.4155124653739612,
609
+ "grad_norm": 0.52264803647995,
610
+ "learning_rate": 6.340863063803187e-06,
611
+ "loss": 0.3687,
612
+ "step": 86
613
+ },
614
+ {
615
+ "epoch": 1.4321329639889195,
616
+ "grad_norm": 0.5726013779640198,
617
+ "learning_rate": 6.247205720289907e-06,
618
+ "loss": 0.4127,
619
+ "step": 87
620
+ },
621
+ {
622
+ "epoch": 1.448753462603878,
623
+ "grad_norm": 0.5129911303520203,
624
+ "learning_rate": 6.153079353712201e-06,
625
+ "loss": 0.3608,
626
+ "step": 88
627
+ },
628
+ {
629
+ "epoch": 1.4653739612188366,
630
+ "grad_norm": 0.5869404673576355,
631
+ "learning_rate": 6.058519361147055e-06,
632
+ "loss": 0.369,
633
+ "step": 89
634
+ },
635
+ {
636
+ "epoch": 1.481994459833795,
637
+ "grad_norm": 0.4603992998600006,
638
+ "learning_rate": 5.9635613027404495e-06,
639
+ "loss": 0.2792,
640
+ "step": 90
641
+ },
642
+ {
643
+ "epoch": 1.4986149584487536,
644
+ "grad_norm": 0.433829128742218,
645
+ "learning_rate": 5.8682408883346535e-06,
646
+ "loss": 0.2935,
647
+ "step": 91
648
+ },
649
+ {
650
+ "epoch": 1.5152354570637119,
651
+ "grad_norm": 0.4892548620700836,
652
+ "learning_rate": 5.772593964039203e-06,
653
+ "loss": 0.3591,
654
+ "step": 92
655
+ },
656
+ {
657
+ "epoch": 1.5318559556786704,
658
+ "grad_norm": 0.4414325952529907,
659
+ "learning_rate": 5.6766564987506564e-06,
660
+ "loss": 0.3312,
661
+ "step": 93
662
+ },
663
+ {
664
+ "epoch": 1.548476454293629,
665
+ "grad_norm": 0.5104185938835144,
666
+ "learning_rate": 5.5804645706261515e-06,
667
+ "loss": 0.3524,
668
+ "step": 94
669
+ },
670
+ {
671
+ "epoch": 1.5650969529085872,
672
+ "grad_norm": 0.46491438150405884,
673
+ "learning_rate": 5.484054353515896e-06,
674
+ "loss": 0.3127,
675
+ "step": 95
676
+ },
677
+ {
678
+ "epoch": 1.5817174515235457,
679
+ "grad_norm": 0.5037529468536377,
680
+ "learning_rate": 5.387462103359655e-06,
681
+ "loss": 0.3549,
682
+ "step": 96
683
+ },
684
+ {
685
+ "epoch": 1.5983379501385042,
686
+ "grad_norm": 0.456927090883255,
687
+ "learning_rate": 5.290724144552379e-06,
688
+ "loss": 0.3583,
689
+ "step": 97
690
+ },
691
+ {
692
+ "epoch": 1.6149584487534625,
693
+ "grad_norm": 0.48146891593933105,
694
+ "learning_rate": 5.193876856284085e-06,
695
+ "loss": 0.3485,
696
+ "step": 98
697
+ },
698
+ {
699
+ "epoch": 1.631578947368421,
700
+ "grad_norm": 0.45695117115974426,
701
+ "learning_rate": 5.096956658859122e-06,
702
+ "loss": 0.3325,
703
+ "step": 99
704
+ },
705
+ {
706
+ "epoch": 1.6481994459833795,
707
+ "grad_norm": 0.46289077401161194,
708
+ "learning_rate": 5e-06,
709
+ "loss": 0.3461,
710
+ "step": 100
711
+ },
712
+ {
713
+ "epoch": 1.6648199445983378,
714
+ "grad_norm": 0.5340746641159058,
715
+ "learning_rate": 4.903043341140879e-06,
716
+ "loss": 0.3856,
717
+ "step": 101
718
+ },
719
+ {
720
+ "epoch": 1.6814404432132966,
721
+ "grad_norm": 0.433956503868103,
722
+ "learning_rate": 4.806123143715916e-06,
723
+ "loss": 0.3166,
724
+ "step": 102
725
+ },
726
+ {
727
+ "epoch": 1.6980609418282548,
728
+ "grad_norm": 0.4446304440498352,
729
+ "learning_rate": 4.7092758554476215e-06,
730
+ "loss": 0.3378,
731
+ "step": 103
732
+ },
733
+ {
734
+ "epoch": 1.7146814404432131,
735
+ "grad_norm": 0.5027093291282654,
736
+ "learning_rate": 4.6125378966403465e-06,
737
+ "loss": 0.3915,
738
+ "step": 104
739
+ },
740
+ {
741
+ "epoch": 1.7313019390581719,
742
+ "grad_norm": 0.5546647310256958,
743
+ "learning_rate": 4.515945646484105e-06,
744
+ "loss": 0.3484,
745
+ "step": 105
746
+ },
747
+ {
748
+ "epoch": 1.7479224376731302,
749
+ "grad_norm": 0.49674123525619507,
750
+ "learning_rate": 4.4195354293738484e-06,
751
+ "loss": 0.3501,
752
+ "step": 106
753
+ },
754
+ {
755
+ "epoch": 1.7645429362880887,
756
+ "grad_norm": 0.5134773850440979,
757
+ "learning_rate": 4.323343501249346e-06,
758
+ "loss": 0.3818,
759
+ "step": 107
760
+ },
761
+ {
762
+ "epoch": 1.7811634349030472,
763
+ "grad_norm": 0.5111790299415588,
764
+ "learning_rate": 4.227406035960798e-06,
765
+ "loss": 0.4027,
766
+ "step": 108
767
+ },
768
+ {
769
+ "epoch": 1.7977839335180055,
770
+ "grad_norm": 0.5103554129600525,
771
+ "learning_rate": 4.131759111665349e-06,
772
+ "loss": 0.3295,
773
+ "step": 109
774
+ },
775
+ {
776
+ "epoch": 1.814404432132964,
777
+ "grad_norm": 0.48488280177116394,
778
+ "learning_rate": 4.036438697259551e-06,
779
+ "loss": 0.3339,
780
+ "step": 110
781
+ },
782
+ {
783
+ "epoch": 1.8310249307479225,
784
+ "grad_norm": 0.4840296506881714,
785
+ "learning_rate": 3.941480638852948e-06,
786
+ "loss": 0.3519,
787
+ "step": 111
788
+ },
789
+ {
790
+ "epoch": 1.8476454293628808,
791
+ "grad_norm": 0.4919949471950531,
792
+ "learning_rate": 3.8469206462878e-06,
793
+ "loss": 0.328,
794
+ "step": 112
795
+ },
796
+ {
797
+ "epoch": 1.8642659279778393,
798
+ "grad_norm": 0.5291365385055542,
799
+ "learning_rate": 3.752794279710094e-06,
800
+ "loss": 0.3753,
801
+ "step": 113
802
+ },
803
+ {
804
+ "epoch": 1.8808864265927978,
805
+ "grad_norm": 0.4807715117931366,
806
+ "learning_rate": 3.6591369361968127e-06,
807
+ "loss": 0.393,
808
+ "step": 114
809
+ },
810
+ {
811
+ "epoch": 1.897506925207756,
812
+ "grad_norm": 0.4700012803077698,
813
+ "learning_rate": 3.5659838364445505e-06,
814
+ "loss": 0.3182,
815
+ "step": 115
816
+ },
817
+ {
818
+ "epoch": 1.9141274238227148,
819
+ "grad_norm": 1.0692706108093262,
820
+ "learning_rate": 3.473370011524435e-06,
821
+ "loss": 0.3463,
822
+ "step": 116
823
+ },
824
+ {
825
+ "epoch": 1.9307479224376731,
826
+ "grad_norm": 0.49183958768844604,
827
+ "learning_rate": 3.3813302897083955e-06,
828
+ "loss": 0.3694,
829
+ "step": 117
830
+ },
831
+ {
832
+ "epoch": 1.9473684210526314,
833
+ "grad_norm": 0.5577133893966675,
834
+ "learning_rate": 3.289899283371657e-06,
835
+ "loss": 0.3693,
836
+ "step": 118
837
+ },
838
+ {
839
+ "epoch": 1.9639889196675901,
840
+ "grad_norm": 0.47118237614631653,
841
+ "learning_rate": 3.1991113759764493e-06,
842
+ "loss": 0.3325,
843
+ "step": 119
844
+ },
845
+ {
846
+ "epoch": 1.9806094182825484,
847
+ "grad_norm": 0.44954901933670044,
848
+ "learning_rate": 3.1090007091417884e-06,
849
+ "loss": 0.3497,
850
+ "step": 120
851
+ },
852
+ {
853
+ "epoch": 1.997229916897507,
854
+ "grad_norm": 0.5316449403762817,
855
+ "learning_rate": 3.019601169804216e-06,
856
+ "loss": 0.4239,
857
+ "step": 121
858
+ },
859
+ {
860
+ "epoch": 2.0,
861
+ "grad_norm": 0.5316449403762817,
862
+ "learning_rate": 2.9309463774743047e-06,
863
+ "loss": 0.302,
864
+ "step": 122
865
+ },
866
+ {
867
+ "epoch": 2.0166204986149583,
868
+ "grad_norm": 1.3086326122283936,
869
+ "learning_rate": 2.843069671593734e-06,
870
+ "loss": 0.2255,
871
+ "step": 123
872
+ },
873
+ {
874
+ "epoch": 2.033240997229917,
875
+ "grad_norm": 0.4746488928794861,
876
+ "learning_rate": 2.7560040989976894e-06,
877
+ "loss": 0.2275,
878
+ "step": 124
879
+ },
880
+ {
881
+ "epoch": 2.0498614958448753,
882
+ "grad_norm": 0.4944143295288086,
883
+ "learning_rate": 2.6697824014873076e-06,
884
+ "loss": 0.2648,
885
+ "step": 125
886
+ },
887
+ {
888
+ "epoch": 2.0664819944598336,
889
+ "grad_norm": 0.5195774435997009,
890
+ "learning_rate": 2.5844370035168077e-06,
891
+ "loss": 0.2707,
892
+ "step": 126
893
+ },
894
+ {
895
+ "epoch": 2.0831024930747923,
896
+ "grad_norm": 0.885553240776062,
897
+ "learning_rate": 2.5000000000000015e-06,
898
+ "loss": 0.2764,
899
+ "step": 127
900
+ },
901
+ {
902
+ "epoch": 2.0997229916897506,
903
+ "grad_norm": 0.5028234124183655,
904
+ "learning_rate": 2.4165031442406857e-06,
905
+ "loss": 0.2503,
906
+ "step": 128
907
+ },
908
+ {
909
+ "epoch": 2.1163434903047094,
910
+ "grad_norm": 0.4780957102775574,
911
+ "learning_rate": 2.333977835991545e-06,
912
+ "loss": 0.2406,
913
+ "step": 129
914
+ },
915
+ {
916
+ "epoch": 2.1329639889196677,
917
+ "grad_norm": 0.46052825450897217,
918
+ "learning_rate": 2.2524551096459703e-06,
919
+ "loss": 0.2155,
920
+ "step": 130
921
+ },
922
+ {
923
+ "epoch": 2.149584487534626,
924
+ "grad_norm": 0.6180452704429626,
925
+ "learning_rate": 2.171965622567308e-06,
926
+ "loss": 0.2787,
927
+ "step": 131
928
+ },
929
+ {
930
+ "epoch": 2.1662049861495847,
931
+ "grad_norm": 0.6939100027084351,
932
+ "learning_rate": 2.0925396435598665e-06,
933
+ "loss": 0.246,
934
+ "step": 132
935
+ },
936
+ {
937
+ "epoch": 2.182825484764543,
938
+ "grad_norm": 0.6042692065238953,
939
+ "learning_rate": 2.0142070414860704e-06,
940
+ "loss": 0.2609,
941
+ "step": 133
942
+ },
943
+ {
944
+ "epoch": 2.1994459833795013,
945
+ "grad_norm": 0.7851183414459229,
946
+ "learning_rate": 1.936997274033986e-06,
947
+ "loss": 0.2876,
948
+ "step": 134
949
+ },
950
+ {
951
+ "epoch": 2.21606648199446,
952
+ "grad_norm": 0.5801565051078796,
953
+ "learning_rate": 1.8609393766395083e-06,
954
+ "loss": 0.288,
955
+ "step": 135
956
+ },
957
+ {
958
+ "epoch": 2.2326869806094183,
959
+ "grad_norm": 0.5398533940315247,
960
+ "learning_rate": 1.7860619515673034e-06,
961
+ "loss": 0.2958,
962
+ "step": 136
963
+ },
964
+ {
965
+ "epoch": 2.2493074792243766,
966
+ "grad_norm": 0.48142921924591064,
967
+ "learning_rate": 1.7123931571546826e-06,
968
+ "loss": 0.2506,
969
+ "step": 137
970
+ },
971
+ {
972
+ "epoch": 2.2659279778393353,
973
+ "grad_norm": 0.48484477400779724,
974
+ "learning_rate": 1.639960697222388e-06,
975
+ "loss": 0.2166,
976
+ "step": 138
977
+ },
978
+ {
979
+ "epoch": 2.2825484764542936,
980
+ "grad_norm": 0.4676513075828552,
981
+ "learning_rate": 1.5687918106563326e-06,
982
+ "loss": 0.2558,
983
+ "step": 139
984
+ },
985
+ {
986
+ "epoch": 2.299168975069252,
987
+ "grad_norm": 0.5008206963539124,
988
+ "learning_rate": 1.4989132611641576e-06,
989
+ "loss": 0.2315,
990
+ "step": 140
991
+ },
992
+ {
993
+ "epoch": 2.3157894736842106,
994
+ "grad_norm": 0.5055615901947021,
995
+ "learning_rate": 1.4303513272105057e-06,
996
+ "loss": 0.278,
997
+ "step": 141
998
+ },
999
+ {
1000
+ "epoch": 2.332409972299169,
1001
+ "grad_norm": 0.5048314332962036,
1002
+ "learning_rate": 1.3631317921347564e-06,
1003
+ "loss": 0.2469,
1004
+ "step": 142
1005
+ },
1006
+ {
1007
+ "epoch": 2.349030470914127,
1008
+ "grad_norm": 0.4561052620410919,
1009
+ "learning_rate": 1.297279934454978e-06,
1010
+ "loss": 0.2363,
1011
+ "step": 143
1012
+ },
1013
+ {
1014
+ "epoch": 2.365650969529086,
1015
+ "grad_norm": 0.4409971237182617,
1016
+ "learning_rate": 1.2328205183616964e-06,
1017
+ "loss": 0.2582,
1018
+ "step": 144
1019
+ },
1020
+ {
1021
+ "epoch": 2.3822714681440442,
1022
+ "grad_norm": 0.5186073780059814,
1023
+ "learning_rate": 1.1697777844051105e-06,
1024
+ "loss": 0.2354,
1025
+ "step": 145
1026
+ },
1027
+ {
1028
+ "epoch": 2.398891966759003,
1029
+ "grad_norm": 0.4931983947753906,
1030
+ "learning_rate": 1.1081754403792e-06,
1031
+ "loss": 0.2628,
1032
+ "step": 146
1033
+ },
1034
+ {
1035
+ "epoch": 2.4155124653739612,
1036
+ "grad_norm": 0.4725812077522278,
1037
+ "learning_rate": 1.0480366524062041e-06,
1038
+ "loss": 0.2465,
1039
+ "step": 147
1040
+ },
1041
+ {
1042
+ "epoch": 2.4321329639889195,
1043
+ "grad_norm": 0.459830641746521,
1044
+ "learning_rate": 9.893840362247809e-07,
1045
+ "loss": 0.2494,
1046
+ "step": 148
1047
+ },
1048
+ {
1049
+ "epoch": 2.4487534626038783,
1050
+ "grad_norm": 0.45882484316825867,
1051
+ "learning_rate": 9.322396486851626e-07,
1052
+ "loss": 0.2572,
1053
+ "step": 149
1054
+ },
1055
+ {
1056
+ "epoch": 2.4653739612188366,
1057
+ "grad_norm": 0.4628044664859772,
1058
+ "learning_rate": 8.766249794544662e-07,
1059
+ "loss": 0.2473,
1060
+ "step": 150
1061
+ },
1062
+ {
1063
+ "epoch": 2.481994459833795,
1064
+ "grad_norm": 0.43482884764671326,
1065
+ "learning_rate": 8.225609429353187e-07,
1066
+ "loss": 0.2334,
1067
+ "step": 151
1068
+ },
1069
+ {
1070
+ "epoch": 2.4986149584487536,
1071
+ "grad_norm": 0.5092786550521851,
1072
+ "learning_rate": 7.700678704007947e-07,
1073
+ "loss": 0.2464,
1074
+ "step": 152
1075
+ },
1076
+ {
1077
+ "epoch": 2.515235457063712,
1078
+ "grad_norm": 0.5002970695495605,
1079
+ "learning_rate": 7.191655023486682e-07,
1080
+ "loss": 0.2386,
1081
+ "step": 153
1082
+ },
1083
+ {
1084
+ "epoch": 2.5318559556786706,
1085
+ "grad_norm": 0.44085896015167236,
1086
+ "learning_rate": 6.698729810778065e-07,
1087
+ "loss": 0.2231,
1088
+ "step": 154
1089
+ },
1090
+ {
1091
+ "epoch": 2.548476454293629,
1092
+ "grad_norm": 0.4750898480415344,
1093
+ "learning_rate": 6.222088434895462e-07,
1094
+ "loss": 0.2746,
1095
+ "step": 155
1096
+ },
1097
+ {
1098
+ "epoch": 2.565096952908587,
1099
+ "grad_norm": 0.5058760643005371,
1100
+ "learning_rate": 5.76191014116711e-07,
1101
+ "loss": 0.2753,
1102
+ "step": 156
1103
+ },
1104
+ {
1105
+ "epoch": 2.581717451523546,
1106
+ "grad_norm": 0.4807314872741699,
1107
+ "learning_rate": 5.318367983829393e-07,
1108
+ "loss": 0.2295,
1109
+ "step": 157
1110
+ },
1111
+ {
1112
+ "epoch": 2.598337950138504,
1113
+ "grad_norm": 0.4975450336933136,
1114
+ "learning_rate": 4.891628760948114e-07,
1115
+ "loss": 0.2623,
1116
+ "step": 158
1117
+ },
1118
+ {
1119
+ "epoch": 2.6149584487534625,
1120
+ "grad_norm": 0.44517505168914795,
1121
+ "learning_rate": 4.481852951692672e-07,
1122
+ "loss": 0.2505,
1123
+ "step": 159
1124
+ },
1125
+ {
1126
+ "epoch": 2.6315789473684212,
1127
+ "grad_norm": 0.526871919631958,
1128
+ "learning_rate": 4.089194655986306e-07,
1129
+ "loss": 0.2944,
1130
+ "step": 160
1131
+ },
1132
+ {
1133
+ "epoch": 2.6481994459833795,
1134
+ "grad_norm": 0.5860976576805115,
1135
+ "learning_rate": 3.7138015365554834e-07,
1136
+ "loss": 0.2929,
1137
+ "step": 161
1138
+ },
1139
+ {
1140
+ "epoch": 2.664819944598338,
1141
+ "grad_norm": 0.5570012927055359,
1142
+ "learning_rate": 3.355814763399973e-07,
1143
+ "loss": 0.2669,
1144
+ "step": 162
1145
+ },
1146
+ {
1147
+ "epoch": 2.6814404432132966,
1148
+ "grad_norm": 0.46305856108665466,
1149
+ "learning_rate": 3.015368960704584e-07,
1150
+ "loss": 0.2464,
1151
+ "step": 163
1152
+ },
1153
+ {
1154
+ "epoch": 2.698060941828255,
1155
+ "grad_norm": 0.49931517243385315,
1156
+ "learning_rate": 2.6925921562124867e-07,
1157
+ "loss": 0.233,
1158
+ "step": 164
1159
+ },
1160
+ {
1161
+ "epoch": 2.714681440443213,
1162
+ "grad_norm": 0.4253719449043274,
1163
+ "learning_rate": 2.3876057330792344e-07,
1164
+ "loss": 0.2115,
1165
+ "step": 165
1166
+ },
1167
+ {
1168
+ "epoch": 2.731301939058172,
1169
+ "grad_norm": 0.46956562995910645,
1170
+ "learning_rate": 2.1005243842255552e-07,
1171
+ "loss": 0.2419,
1172
+ "step": 166
1173
+ },
1174
+ {
1175
+ "epoch": 2.74792243767313,
1176
+ "grad_norm": 0.47405821084976196,
1177
+ "learning_rate": 1.8314560692059836e-07,
1178
+ "loss": 0.2442,
1179
+ "step": 167
1180
+ },
1181
+ {
1182
+ "epoch": 2.7645429362880884,
1183
+ "grad_norm": 0.5373594164848328,
1184
+ "learning_rate": 1.5805019736097105e-07,
1185
+ "loss": 0.304,
1186
+ "step": 168
1187
+ },
1188
+ {
1189
+ "epoch": 2.781163434903047,
1190
+ "grad_norm": 0.49911409616470337,
1191
+ "learning_rate": 1.3477564710088097e-07,
1192
+ "loss": 0.2604,
1193
+ "step": 169
1194
+ },
1195
+ {
1196
+ "epoch": 2.7977839335180055,
1197
+ "grad_norm": 0.524211585521698,
1198
+ "learning_rate": 1.1333070874682217e-07,
1199
+ "loss": 0.2319,
1200
+ "step": 170
1201
+ },
1202
+ {
1203
+ "epoch": 2.8144044321329638,
1204
+ "grad_norm": 0.49799832701683044,
1205
+ "learning_rate": 9.372344686307655e-08,
1206
+ "loss": 0.2648,
1207
+ "step": 171
1208
+ },
1209
+ {
1210
+ "epoch": 2.8310249307479225,
1211
+ "grad_norm": 0.4979800581932068,
1212
+ "learning_rate": 7.59612349389599e-08,
1213
+ "loss": 0.2671,
1214
+ "step": 172
1215
+ },
1216
+ {
1217
+ "epoch": 2.847645429362881,
1218
+ "grad_norm": 0.5030661225318909,
1219
+ "learning_rate": 6.005075261595495e-08,
1220
+ "loss": 0.2219,
1221
+ "step": 173
1222
+ },
1223
+ {
1224
+ "epoch": 2.864265927977839,
1225
+ "grad_norm": 0.4839530885219574,
1226
+ "learning_rate": 4.599798317577342e-08,
1227
+ "loss": 0.2981,
1228
+ "step": 174
1229
+ },
1230
+ {
1231
+ "epoch": 2.880886426592798,
1232
+ "grad_norm": 0.49113729596138,
1233
+ "learning_rate": 3.3808211290284886e-08,
1234
+ "loss": 0.2574,
1235
+ "step": 175
1236
+ },
1237
+ {
1238
+ "epoch": 2.897506925207756,
1239
+ "grad_norm": 0.5154249668121338,
1240
+ "learning_rate": 2.3486021034170857e-08,
1241
+ "loss": 0.2584,
1242
+ "step": 176
1243
+ },
1244
+ {
1245
+ "epoch": 2.914127423822715,
1246
+ "grad_norm": 0.46952885389328003,
1247
+ "learning_rate": 1.5035294161039882e-08,
1248
+ "loss": 0.2785,
1249
+ "step": 177
1250
+ },
1251
+ {
1252
+ "epoch": 2.930747922437673,
1253
+ "grad_norm": 0.49860695004463196,
1254
+ "learning_rate": 8.459208643659122e-09,
1255
+ "loss": 0.2572,
1256
+ "step": 178
1257
+ },
1258
+ {
1259
+ "epoch": 2.9473684210526314,
1260
+ "grad_norm": 0.5341483354568481,
1261
+ "learning_rate": 3.760237478849793e-09,
1262
+ "loss": 0.2964,
1263
+ "step": 179
1264
+ },
1265
+ {
1266
+ "epoch": 2.96398891966759,
1267
+ "grad_norm": 0.5575993061065674,
1268
+ "learning_rate": 9.401477574932927e-10,
1269
+ "loss": 0.2896,
1270
+ "step": 180
1271
+ },
1272
+ {
1273
+ "epoch": 2.96398891966759,
1274
+ "step": 180,
1275
+ "total_flos": 6.743893969836442e+16,
1276
+ "train_loss": 0.3866574793226189,
1277
+ "train_runtime": 24143.75,
1278
+ "train_samples_per_second": 0.179,
1279
+ "train_steps_per_second": 0.007
1280
+ }
1281
+ ],
1282
+ "logging_steps": 1,
1283
+ "max_steps": 180,
1284
+ "num_input_tokens_seen": 0,
1285
+ "num_train_epochs": 3,
1286
+ "save_steps": 100,
1287
+ "stateful_callbacks": {
1288
+ "TrainerControl": {
1289
+ "args": {
1290
+ "should_epoch_stop": false,
1291
+ "should_evaluate": false,
1292
+ "should_log": false,
1293
+ "should_save": true,
1294
+ "should_training_stop": true
1295
+ },
1296
+ "attributes": {}
1297
+ }
1298
+ },
1299
+ "total_flos": 6.743893969836442e+16,
1300
+ "train_batch_size": 1,
1301
+ "trial_name": null,
1302
+ "trial_params": null
1303
+ }
training_loss.png ADDED