OpenLeecher commited on
Commit
3962092
1 Parent(s): 1e78fdd

End of training

Browse files
Files changed (5) hide show
  1. README.md +2 -1
  2. all_results.json +12 -0
  3. eval_results.json +7 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1602 -0
README.md CHANGED
@@ -4,6 +4,7 @@ license: llama3.1
4
  base_model: meta-llama/Llama-3.1-8B
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: llama_8b_lima_40
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # llama_8b_lima_40
17
 
18
- This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.9288
21
 
 
4
  base_model: meta-llama/Llama-3.1-8B
5
  tags:
6
  - llama-factory
7
+ - full
8
  - generated_from_trainer
9
  model-index:
10
  - name: llama_8b_lima_40
 
16
 
17
  # llama_8b_lima_40
18
 
19
+ This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) on the open_webui_dataset dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.9288
22
 
all_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_loss": 0.9287646412849426,
4
+ "eval_runtime": 19.2416,
5
+ "eval_samples_per_second": 10.394,
6
+ "eval_steps_per_second": 2.599,
7
+ "total_flos": 8.200255844856627e+16,
8
+ "train_loss": 0.8882445046534905,
9
+ "train_runtime": 9157.2611,
10
+ "train_samples_per_second": 3.18,
11
+ "train_steps_per_second": 0.114
12
+ }
eval_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_loss": 0.9287646412849426,
4
+ "eval_runtime": 19.2416,
5
+ "eval_samples_per_second": 10.394,
6
+ "eval_steps_per_second": 2.599
7
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 8.200255844856627e+16,
4
+ "train_loss": 0.8882445046534905,
5
+ "train_runtime": 9157.2611,
6
+ "train_samples_per_second": 3.18,
7
+ "train_steps_per_second": 0.114
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1602 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "eval_steps": 80,
6
+ "global_step": 1040,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.004807692307692308,
13
+ "grad_norm": 172.91949232049635,
14
+ "learning_rate": 7.142857142857143e-07,
15
+ "loss": 1.275,
16
+ "step": 5
17
+ },
18
+ {
19
+ "epoch": 0.009615384615384616,
20
+ "grad_norm": 15.162819415928812,
21
+ "learning_rate": 1.4285714285714286e-06,
22
+ "loss": 1.0842,
23
+ "step": 10
24
+ },
25
+ {
26
+ "epoch": 0.014423076923076924,
27
+ "grad_norm": 43.156613200294984,
28
+ "learning_rate": 2.142857142857143e-06,
29
+ "loss": 1.0731,
30
+ "step": 15
31
+ },
32
+ {
33
+ "epoch": 0.019230769230769232,
34
+ "grad_norm": 12.05730410265307,
35
+ "learning_rate": 2.8571428571428573e-06,
36
+ "loss": 0.9966,
37
+ "step": 20
38
+ },
39
+ {
40
+ "epoch": 0.02403846153846154,
41
+ "grad_norm": 29.645022197375592,
42
+ "learning_rate": 3.5714285714285714e-06,
43
+ "loss": 0.9506,
44
+ "step": 25
45
+ },
46
+ {
47
+ "epoch": 0.028846153846153848,
48
+ "grad_norm": 27.496331665845375,
49
+ "learning_rate": 4.285714285714286e-06,
50
+ "loss": 0.9081,
51
+ "step": 30
52
+ },
53
+ {
54
+ "epoch": 0.03365384615384615,
55
+ "grad_norm": 33.04607017399404,
56
+ "learning_rate": 5e-06,
57
+ "loss": 0.9839,
58
+ "step": 35
59
+ },
60
+ {
61
+ "epoch": 0.038461538461538464,
62
+ "grad_norm": 4.67215100059801,
63
+ "learning_rate": 5.7142857142857145e-06,
64
+ "loss": 0.9143,
65
+ "step": 40
66
+ },
67
+ {
68
+ "epoch": 0.04326923076923077,
69
+ "grad_norm": 3.5686279553698648,
70
+ "learning_rate": 5.958760472832704e-06,
71
+ "loss": 1.0449,
72
+ "step": 45
73
+ },
74
+ {
75
+ "epoch": 0.04807692307692308,
76
+ "grad_norm": 3.5137053750826093,
77
+ "learning_rate": 5.890441320869003e-06,
78
+ "loss": 1.1233,
79
+ "step": 50
80
+ },
81
+ {
82
+ "epoch": 0.052884615384615384,
83
+ "grad_norm": 39.50156599931848,
84
+ "learning_rate": 5.822637783235761e-06,
85
+ "loss": 0.7838,
86
+ "step": 55
87
+ },
88
+ {
89
+ "epoch": 0.057692307692307696,
90
+ "grad_norm": 3.9954682494826987,
91
+ "learning_rate": 5.755348556225628e-06,
92
+ "loss": 0.937,
93
+ "step": 60
94
+ },
95
+ {
96
+ "epoch": 0.0625,
97
+ "grad_norm": 3.606702118329892,
98
+ "learning_rate": 5.688572332818116e-06,
99
+ "loss": 0.9261,
100
+ "step": 65
101
+ },
102
+ {
103
+ "epoch": 0.0673076923076923,
104
+ "grad_norm": 5.018248874418577,
105
+ "learning_rate": 5.622307802654199e-06,
106
+ "loss": 0.929,
107
+ "step": 70
108
+ },
109
+ {
110
+ "epoch": 0.07211538461538461,
111
+ "grad_norm": 14.46426266685474,
112
+ "learning_rate": 5.556553652010609e-06,
113
+ "loss": 1.0281,
114
+ "step": 75
115
+ },
116
+ {
117
+ "epoch": 0.07692307692307693,
118
+ "grad_norm": 6.717639424187089,
119
+ "learning_rate": 5.4913085637737825e-06,
120
+ "loss": 1.0252,
121
+ "step": 80
122
+ },
123
+ {
124
+ "epoch": 0.07692307692307693,
125
+ "eval_loss": 1.0117295980453491,
126
+ "eval_runtime": 22.175,
127
+ "eval_samples_per_second": 9.019,
128
+ "eval_steps_per_second": 2.255,
129
+ "step": 80
130
+ },
131
+ {
132
+ "epoch": 0.08173076923076923,
133
+ "grad_norm": 5.9628841757039055,
134
+ "learning_rate": 5.42657121741348e-06,
135
+ "loss": 0.9798,
136
+ "step": 85
137
+ },
138
+ {
139
+ "epoch": 0.08653846153846154,
140
+ "grad_norm": 2.8480154385386633,
141
+ "learning_rate": 5.362340288956054e-06,
142
+ "loss": 0.9422,
143
+ "step": 90
144
+ },
145
+ {
146
+ "epoch": 0.09134615384615384,
147
+ "grad_norm": 3.041511518203806,
148
+ "learning_rate": 5.298614450957377e-06,
149
+ "loss": 0.7751,
150
+ "step": 95
151
+ },
152
+ {
153
+ "epoch": 0.09615384615384616,
154
+ "grad_norm": 3.245117237641717,
155
+ "learning_rate": 5.235392372475402e-06,
156
+ "loss": 1.0559,
157
+ "step": 100
158
+ },
159
+ {
160
+ "epoch": 0.10096153846153846,
161
+ "grad_norm": 4.186131160050305,
162
+ "learning_rate": 5.1726727190423596e-06,
163
+ "loss": 0.8535,
164
+ "step": 105
165
+ },
166
+ {
167
+ "epoch": 0.10576923076923077,
168
+ "grad_norm": 3.2112656498482743,
169
+ "learning_rate": 5.110454152636601e-06,
170
+ "loss": 1.0847,
171
+ "step": 110
172
+ },
173
+ {
174
+ "epoch": 0.11057692307692307,
175
+ "grad_norm": 3.5829116342694443,
176
+ "learning_rate": 5.04873533165404e-06,
177
+ "loss": 0.989,
178
+ "step": 115
179
+ },
180
+ {
181
+ "epoch": 0.11538461538461539,
182
+ "grad_norm": 2.8192163511739308,
183
+ "learning_rate": 4.987514910879233e-06,
184
+ "loss": 0.7562,
185
+ "step": 120
186
+ },
187
+ {
188
+ "epoch": 0.1201923076923077,
189
+ "grad_norm": 3.552581067366997,
190
+ "learning_rate": 4.9267915414560465e-06,
191
+ "loss": 0.882,
192
+ "step": 125
193
+ },
194
+ {
195
+ "epoch": 0.125,
196
+ "grad_norm": 3.166131159213283,
197
+ "learning_rate": 4.866563870857949e-06,
198
+ "loss": 0.8461,
199
+ "step": 130
200
+ },
201
+ {
202
+ "epoch": 0.12980769230769232,
203
+ "grad_norm": 3.4902158184612873,
204
+ "learning_rate": 4.806830542857871e-06,
205
+ "loss": 1.0949,
206
+ "step": 135
207
+ },
208
+ {
209
+ "epoch": 0.1346153846153846,
210
+ "grad_norm": 2.763230275625746,
211
+ "learning_rate": 4.7475901974976784e-06,
212
+ "loss": 0.9741,
213
+ "step": 140
214
+ },
215
+ {
216
+ "epoch": 0.13942307692307693,
217
+ "grad_norm": 3.7680960024565047,
218
+ "learning_rate": 4.688841471057191e-06,
219
+ "loss": 0.8267,
220
+ "step": 145
221
+ },
222
+ {
223
+ "epoch": 0.14423076923076922,
224
+ "grad_norm": 3.7223152035406177,
225
+ "learning_rate": 4.630582996022805e-06,
226
+ "loss": 0.9237,
227
+ "step": 150
228
+ },
229
+ {
230
+ "epoch": 0.14903846153846154,
231
+ "grad_norm": 163.68789967501425,
232
+ "learning_rate": 4.572813401055646e-06,
233
+ "loss": 0.9735,
234
+ "step": 155
235
+ },
236
+ {
237
+ "epoch": 0.15384615384615385,
238
+ "grad_norm": 4.052744089990857,
239
+ "learning_rate": 4.515531310959294e-06,
240
+ "loss": 0.8185,
241
+ "step": 160
242
+ },
243
+ {
244
+ "epoch": 0.15384615384615385,
245
+ "eval_loss": 0.9820164442062378,
246
+ "eval_runtime": 20.5987,
247
+ "eval_samples_per_second": 9.709,
248
+ "eval_steps_per_second": 2.427,
249
+ "step": 160
250
+ },
251
+ {
252
+ "epoch": 0.15865384615384615,
253
+ "grad_norm": 3.5962134321693213,
254
+ "learning_rate": 4.458735346647049e-06,
255
+ "loss": 0.9701,
256
+ "step": 165
257
+ },
258
+ {
259
+ "epoch": 0.16346153846153846,
260
+ "grad_norm": 3.405720690826482,
261
+ "learning_rate": 4.402424125108714e-06,
262
+ "loss": 0.7428,
263
+ "step": 170
264
+ },
265
+ {
266
+ "epoch": 0.16826923076923078,
267
+ "grad_norm": 3.5656581164655297,
268
+ "learning_rate": 4.346596259376934e-06,
269
+ "loss": 1.0573,
270
+ "step": 175
271
+ },
272
+ {
273
+ "epoch": 0.17307692307692307,
274
+ "grad_norm": 3.1116839574479944,
275
+ "learning_rate": 4.291250358493015e-06,
276
+ "loss": 0.99,
277
+ "step": 180
278
+ },
279
+ {
280
+ "epoch": 0.1778846153846154,
281
+ "grad_norm": 3.1856579669538037,
282
+ "learning_rate": 4.236385027472282e-06,
283
+ "loss": 0.9208,
284
+ "step": 185
285
+ },
286
+ {
287
+ "epoch": 0.18269230769230768,
288
+ "grad_norm": 2.713262969000155,
289
+ "learning_rate": 4.181998867268901e-06,
290
+ "loss": 0.9552,
291
+ "step": 190
292
+ },
293
+ {
294
+ "epoch": 0.1875,
295
+ "grad_norm": 3.4690878970474364,
296
+ "learning_rate": 4.1280904747402165e-06,
297
+ "loss": 0.9004,
298
+ "step": 195
299
+ },
300
+ {
301
+ "epoch": 0.19230769230769232,
302
+ "grad_norm": 2.6094836512830755,
303
+ "learning_rate": 4.07465844261054e-06,
304
+ "loss": 1.0189,
305
+ "step": 200
306
+ },
307
+ {
308
+ "epoch": 0.1971153846153846,
309
+ "grad_norm": 2.7258662188339917,
310
+ "learning_rate": 4.021701359434411e-06,
311
+ "loss": 0.8663,
312
+ "step": 205
313
+ },
314
+ {
315
+ "epoch": 0.20192307692307693,
316
+ "grad_norm": 2.130745708170683,
317
+ "learning_rate": 3.9692178095593185e-06,
318
+ "loss": 0.9191,
319
+ "step": 210
320
+ },
321
+ {
322
+ "epoch": 0.20673076923076922,
323
+ "grad_norm": 3.632025896546127,
324
+ "learning_rate": 3.917206373087843e-06,
325
+ "loss": 0.8463,
326
+ "step": 215
327
+ },
328
+ {
329
+ "epoch": 0.21153846153846154,
330
+ "grad_norm": 2.8163127172248754,
331
+ "learning_rate": 3.86566562583925e-06,
332
+ "loss": 0.9113,
333
+ "step": 220
334
+ },
335
+ {
336
+ "epoch": 0.21634615384615385,
337
+ "grad_norm": 2.925143301211318,
338
+ "learning_rate": 3.814594139310489e-06,
339
+ "loss": 0.8026,
340
+ "step": 225
341
+ },
342
+ {
343
+ "epoch": 0.22115384615384615,
344
+ "grad_norm": 3.491601498263278,
345
+ "learning_rate": 3.7639904806365957e-06,
346
+ "loss": 1.0014,
347
+ "step": 230
348
+ },
349
+ {
350
+ "epoch": 0.22596153846153846,
351
+ "grad_norm": 3.60018394829918,
352
+ "learning_rate": 3.7138532125504874e-06,
353
+ "loss": 0.8704,
354
+ "step": 235
355
+ },
356
+ {
357
+ "epoch": 0.23076923076923078,
358
+ "grad_norm": 2.8380175093955193,
359
+ "learning_rate": 3.664180893342146e-06,
360
+ "loss": 0.9686,
361
+ "step": 240
362
+ },
363
+ {
364
+ "epoch": 0.23076923076923078,
365
+ "eval_loss": 0.9701676964759827,
366
+ "eval_runtime": 21.0027,
367
+ "eval_samples_per_second": 9.523,
368
+ "eval_steps_per_second": 2.381,
369
+ "step": 240
370
+ },
371
+ {
372
+ "epoch": 0.23557692307692307,
373
+ "grad_norm": 2.88654950044071,
374
+ "learning_rate": 3.6149720768171497e-06,
375
+ "loss": 0.9927,
376
+ "step": 245
377
+ },
378
+ {
379
+ "epoch": 0.2403846153846154,
380
+ "grad_norm": 4.075449595613525,
381
+ "learning_rate": 3.5662253122545742e-06,
382
+ "loss": 0.8335,
383
+ "step": 250
384
+ },
385
+ {
386
+ "epoch": 0.24519230769230768,
387
+ "grad_norm": 3.1216694362939137,
388
+ "learning_rate": 3.517939144364211e-06,
389
+ "loss": 0.9225,
390
+ "step": 255
391
+ },
392
+ {
393
+ "epoch": 0.25,
394
+ "grad_norm": 3.4006563474977787,
395
+ "learning_rate": 3.4701121132431283e-06,
396
+ "loss": 0.9645,
397
+ "step": 260
398
+ },
399
+ {
400
+ "epoch": 0.2548076923076923,
401
+ "grad_norm": 5.006159977047571,
402
+ "learning_rate": 3.422742754331519e-06,
403
+ "loss": 1.0596,
404
+ "step": 265
405
+ },
406
+ {
407
+ "epoch": 0.25961538461538464,
408
+ "grad_norm": 6.352134442675443,
409
+ "learning_rate": 3.3758295983678575e-06,
410
+ "loss": 0.8279,
411
+ "step": 270
412
+ },
413
+ {
414
+ "epoch": 0.2644230769230769,
415
+ "grad_norm": 4.599350051977448,
416
+ "learning_rate": 3.329371171343321e-06,
417
+ "loss": 0.7653,
418
+ "step": 275
419
+ },
420
+ {
421
+ "epoch": 0.2692307692307692,
422
+ "grad_norm": 3.775428351149461,
423
+ "learning_rate": 3.2833659944554757e-06,
424
+ "loss": 0.8703,
425
+ "step": 280
426
+ },
427
+ {
428
+ "epoch": 0.27403846153846156,
429
+ "grad_norm": 2.5830965772659287,
430
+ "learning_rate": 3.2378125840611978e-06,
431
+ "loss": 0.826,
432
+ "step": 285
433
+ },
434
+ {
435
+ "epoch": 0.27884615384615385,
436
+ "grad_norm": 3.6279739322810642,
437
+ "learning_rate": 3.192709451628821e-06,
438
+ "loss": 0.8617,
439
+ "step": 290
440
+ },
441
+ {
442
+ "epoch": 0.28365384615384615,
443
+ "grad_norm": 2.574184072314173,
444
+ "learning_rate": 3.1480551036895063e-06,
445
+ "loss": 0.9925,
446
+ "step": 295
447
+ },
448
+ {
449
+ "epoch": 0.28846153846153844,
450
+ "grad_norm": 3.245622107244932,
451
+ "learning_rate": 3.1038480417877728e-06,
452
+ "loss": 0.8276,
453
+ "step": 300
454
+ },
455
+ {
456
+ "epoch": 0.2932692307692308,
457
+ "grad_norm": 2.7094531818622385,
458
+ "learning_rate": 3.0600867624312124e-06,
459
+ "loss": 0.93,
460
+ "step": 305
461
+ },
462
+ {
463
+ "epoch": 0.2980769230769231,
464
+ "grad_norm": 3.4108002405937996,
465
+ "learning_rate": 3.0167697570393586e-06,
466
+ "loss": 0.9093,
467
+ "step": 310
468
+ },
469
+ {
470
+ "epoch": 0.30288461538461536,
471
+ "grad_norm": 3.2261512213908468,
472
+ "learning_rate": 2.973895511891673e-06,
473
+ "loss": 0.8436,
474
+ "step": 315
475
+ },
476
+ {
477
+ "epoch": 0.3076923076923077,
478
+ "grad_norm": 2.9111217804814733,
479
+ "learning_rate": 2.9314625080746407e-06,
480
+ "loss": 0.7962,
481
+ "step": 320
482
+ },
483
+ {
484
+ "epoch": 0.3076923076923077,
485
+ "eval_loss": 0.9604336619377136,
486
+ "eval_runtime": 20.7503,
487
+ "eval_samples_per_second": 9.638,
488
+ "eval_steps_per_second": 2.41,
489
+ "step": 320
490
+ },
491
+ {
492
+ "epoch": 0.3125,
493
+ "grad_norm": 3.0069826903052568,
494
+ "learning_rate": 2.8894692214279614e-06,
495
+ "loss": 0.9501,
496
+ "step": 325
497
+ },
498
+ {
499
+ "epoch": 0.3173076923076923,
500
+ "grad_norm": 2.7402700321309497,
501
+ "learning_rate": 2.8479141224897947e-06,
502
+ "loss": 0.8932,
503
+ "step": 330
504
+ },
505
+ {
506
+ "epoch": 0.32211538461538464,
507
+ "grad_norm": 2.850461668225791,
508
+ "learning_rate": 2.806795676441052e-06,
509
+ "loss": 0.8509,
510
+ "step": 335
511
+ },
512
+ {
513
+ "epoch": 0.3269230769230769,
514
+ "grad_norm": 2.8055976999039833,
515
+ "learning_rate": 2.7661123430487023e-06,
516
+ "loss": 0.8531,
517
+ "step": 340
518
+ },
519
+ {
520
+ "epoch": 0.3317307692307692,
521
+ "grad_norm": 3.950790855598453,
522
+ "learning_rate": 2.725862576608072e-06,
523
+ "loss": 0.8428,
524
+ "step": 345
525
+ },
526
+ {
527
+ "epoch": 0.33653846153846156,
528
+ "grad_norm": 2.608925093832874,
529
+ "learning_rate": 2.6860448258841182e-06,
530
+ "loss": 0.9324,
531
+ "step": 350
532
+ },
533
+ {
534
+ "epoch": 0.34134615384615385,
535
+ "grad_norm": 4.161582883109561,
536
+ "learning_rate": 2.6466575340516312e-06,
537
+ "loss": 0.8302,
538
+ "step": 355
539
+ },
540
+ {
541
+ "epoch": 0.34615384615384615,
542
+ "grad_norm": 3.223665474192437,
543
+ "learning_rate": 2.607699138634365e-06,
544
+ "loss": 1.0338,
545
+ "step": 360
546
+ },
547
+ {
548
+ "epoch": 0.35096153846153844,
549
+ "grad_norm": 4.360630028683017,
550
+ "learning_rate": 2.5691680714430463e-06,
551
+ "loss": 0.781,
552
+ "step": 365
553
+ },
554
+ {
555
+ "epoch": 0.3557692307692308,
556
+ "grad_norm": 3.2326801834772256,
557
+ "learning_rate": 2.531062758512248e-06,
558
+ "loss": 0.9277,
559
+ "step": 370
560
+ },
561
+ {
562
+ "epoch": 0.3605769230769231,
563
+ "grad_norm": 3.518325507567999,
564
+ "learning_rate": 2.493381620036082e-06,
565
+ "loss": 0.7648,
566
+ "step": 375
567
+ },
568
+ {
569
+ "epoch": 0.36538461538461536,
570
+ "grad_norm": 3.905842925893686,
571
+ "learning_rate": 2.4561230703027005e-06,
572
+ "loss": 0.7278,
573
+ "step": 380
574
+ },
575
+ {
576
+ "epoch": 0.3701923076923077,
577
+ "grad_norm": 5.371293959548764,
578
+ "learning_rate": 2.4192855176275597e-06,
579
+ "loss": 0.7564,
580
+ "step": 385
581
+ },
582
+ {
583
+ "epoch": 0.375,
584
+ "grad_norm": 2.850075623051217,
585
+ "learning_rate": 2.382867364285416e-06,
586
+ "loss": 0.7983,
587
+ "step": 390
588
+ },
589
+ {
590
+ "epoch": 0.3798076923076923,
591
+ "grad_norm": 6.661652196819241,
592
+ "learning_rate": 2.3468670064410194e-06,
593
+ "loss": 0.9005,
594
+ "step": 395
595
+ },
596
+ {
597
+ "epoch": 0.38461538461538464,
598
+ "grad_norm": 4.700394864120094,
599
+ "learning_rate": 2.3112828340784763e-06,
600
+ "loss": 0.8669,
601
+ "step": 400
602
+ },
603
+ {
604
+ "epoch": 0.38461538461538464,
605
+ "eval_loss": 0.9519588351249695,
606
+ "eval_runtime": 20.79,
607
+ "eval_samples_per_second": 9.62,
608
+ "eval_steps_per_second": 2.405,
609
+ "step": 400
610
+ },
611
+ {
612
+ "epoch": 0.3894230769230769,
613
+ "grad_norm": 3.3197778882289297,
614
+ "learning_rate": 2.2761132309292435e-06,
615
+ "loss": 0.8864,
616
+ "step": 405
617
+ },
618
+ {
619
+ "epoch": 0.3942307692307692,
620
+ "grad_norm": 4.198490325675027,
621
+ "learning_rate": 2.241356574398701e-06,
622
+ "loss": 0.9219,
623
+ "step": 410
624
+ },
625
+ {
626
+ "epoch": 0.39903846153846156,
627
+ "grad_norm": 8.447734132502742,
628
+ "learning_rate": 2.2070112354912867e-06,
629
+ "loss": 0.9542,
630
+ "step": 415
631
+ },
632
+ {
633
+ "epoch": 0.40384615384615385,
634
+ "grad_norm": 3.6043476480492873,
635
+ "learning_rate": 2.1730755787341422e-06,
636
+ "loss": 0.7828,
637
+ "step": 420
638
+ },
639
+ {
640
+ "epoch": 0.40865384615384615,
641
+ "grad_norm": 3.550876988072227,
642
+ "learning_rate": 2.1395479620992237e-06,
643
+ "loss": 0.9213,
644
+ "step": 425
645
+ },
646
+ {
647
+ "epoch": 0.41346153846153844,
648
+ "grad_norm": 4.346265355776214,
649
+ "learning_rate": 2.1064267369238405e-06,
650
+ "loss": 0.8832,
651
+ "step": 430
652
+ },
653
+ {
654
+ "epoch": 0.4182692307692308,
655
+ "grad_norm": 8.956356184457416,
656
+ "learning_rate": 2.0737102478295753e-06,
657
+ "loss": 1.0524,
658
+ "step": 435
659
+ },
660
+ {
661
+ "epoch": 0.4230769230769231,
662
+ "grad_norm": 4.0026073252992225,
663
+ "learning_rate": 2.0413968326395454e-06,
664
+ "loss": 0.8951,
665
+ "step": 440
666
+ },
667
+ {
668
+ "epoch": 0.42788461538461536,
669
+ "grad_norm": 3.769313811024604,
670
+ "learning_rate": 2.009484822293941e-06,
671
+ "loss": 0.8803,
672
+ "step": 445
673
+ },
674
+ {
675
+ "epoch": 0.4326923076923077,
676
+ "grad_norm": 3.4579810908904927,
677
+ "learning_rate": 1.9779725407638038e-06,
678
+ "loss": 0.8575,
679
+ "step": 450
680
+ },
681
+ {
682
+ "epoch": 0.4375,
683
+ "grad_norm": 3.6235925820400112,
684
+ "learning_rate": 1.946858304962993e-06,
685
+ "loss": 0.874,
686
+ "step": 455
687
+ },
688
+ {
689
+ "epoch": 0.4423076923076923,
690
+ "grad_norm": 3.2454132821623607,
691
+ "learning_rate": 1.9161404246582834e-06,
692
+ "loss": 1.0103,
693
+ "step": 460
694
+ },
695
+ {
696
+ "epoch": 0.44711538461538464,
697
+ "grad_norm": 3.438741636237806,
698
+ "learning_rate": 1.8858172023775289e-06,
699
+ "loss": 0.8943,
700
+ "step": 465
701
+ },
702
+ {
703
+ "epoch": 0.4519230769230769,
704
+ "grad_norm": 3.1798809256755205,
705
+ "learning_rate": 1.8558869333158512e-06,
706
+ "loss": 0.9638,
707
+ "step": 470
708
+ },
709
+ {
710
+ "epoch": 0.4567307692307692,
711
+ "grad_norm": 3.6082058444107177,
712
+ "learning_rate": 1.8263479052397838e-06,
713
+ "loss": 0.8781,
714
+ "step": 475
715
+ },
716
+ {
717
+ "epoch": 0.46153846153846156,
718
+ "grad_norm": 2.83102154533938,
719
+ "learning_rate": 1.7971983983893046e-06,
720
+ "loss": 0.8883,
721
+ "step": 480
722
+ },
723
+ {
724
+ "epoch": 0.46153846153846156,
725
+ "eval_loss": 0.9505824446678162,
726
+ "eval_runtime": 20.9063,
727
+ "eval_samples_per_second": 9.566,
728
+ "eval_steps_per_second": 2.392,
729
+ "step": 480
730
+ },
731
+ {
732
+ "epoch": 0.46634615384615385,
733
+ "grad_norm": 2.9075319767858425,
734
+ "learning_rate": 1.768436685377699e-06,
735
+ "loss": 0.7087,
736
+ "step": 485
737
+ },
738
+ {
739
+ "epoch": 0.47115384615384615,
740
+ "grad_norm": 3.7507183698931117,
741
+ "learning_rate": 1.7400610310891816e-06,
742
+ "loss": 0.928,
743
+ "step": 490
744
+ },
745
+ {
746
+ "epoch": 0.47596153846153844,
747
+ "grad_norm": 3.0576523378992326,
748
+ "learning_rate": 1.7120696925742107e-06,
749
+ "loss": 0.8047,
750
+ "step": 495
751
+ },
752
+ {
753
+ "epoch": 0.4807692307692308,
754
+ "grad_norm": 2.6687945237895287,
755
+ "learning_rate": 1.6844609189424112e-06,
756
+ "loss": 1.0923,
757
+ "step": 500
758
+ },
759
+ {
760
+ "epoch": 0.4855769230769231,
761
+ "grad_norm": 3.7056881913494277,
762
+ "learning_rate": 1.6572329512530394e-06,
763
+ "loss": 0.7718,
764
+ "step": 505
765
+ },
766
+ {
767
+ "epoch": 0.49038461538461536,
768
+ "grad_norm": 4.261130783269975,
769
+ "learning_rate": 1.630384022402907e-06,
770
+ "loss": 0.7462,
771
+ "step": 510
772
+ },
773
+ {
774
+ "epoch": 0.4951923076923077,
775
+ "grad_norm": 2.8143821099136024,
776
+ "learning_rate": 1.6039123570116796e-06,
777
+ "loss": 0.965,
778
+ "step": 515
779
+ },
780
+ {
781
+ "epoch": 0.5,
782
+ "grad_norm": 3.0264813559392616,
783
+ "learning_rate": 1.5778161713044614e-06,
784
+ "loss": 0.8943,
785
+ "step": 520
786
+ },
787
+ {
788
+ "epoch": 0.5048076923076923,
789
+ "grad_norm": 18.246495136897703,
790
+ "learning_rate": 1.5520936729915777e-06,
791
+ "loss": 0.9694,
792
+ "step": 525
793
+ },
794
+ {
795
+ "epoch": 0.5096153846153846,
796
+ "grad_norm": 4.039649841411536,
797
+ "learning_rate": 1.5267430611454654e-06,
798
+ "loss": 0.8589,
799
+ "step": 530
800
+ },
801
+ {
802
+ "epoch": 0.5144230769230769,
803
+ "grad_norm": 3.028129518354503,
804
+ "learning_rate": 1.5017625260745615e-06,
805
+ "loss": 0.8761,
806
+ "step": 535
807
+ },
808
+ {
809
+ "epoch": 0.5192307692307693,
810
+ "grad_norm": 3.0504275368028115,
811
+ "learning_rate": 1.4771502491940911e-06,
812
+ "loss": 0.9293,
813
+ "step": 540
814
+ },
815
+ {
816
+ "epoch": 0.5240384615384616,
817
+ "grad_norm": 2.520216608258428,
818
+ "learning_rate": 1.4529044028936606e-06,
819
+ "loss": 0.7738,
820
+ "step": 545
821
+ },
822
+ {
823
+ "epoch": 0.5288461538461539,
824
+ "grad_norm": 3.4732840458118197,
825
+ "learning_rate": 1.4290231504015187e-06,
826
+ "loss": 0.8173,
827
+ "step": 550
828
+ },
829
+ {
830
+ "epoch": 0.5336538461538461,
831
+ "grad_norm": 2.992673074333473,
832
+ "learning_rate": 1.4055046456453867e-06,
833
+ "loss": 1.0166,
834
+ "step": 555
835
+ },
836
+ {
837
+ "epoch": 0.5384615384615384,
838
+ "grad_norm": 3.676863247659791,
839
+ "learning_rate": 1.3823470331097324e-06,
840
+ "loss": 0.7636,
841
+ "step": 560
842
+ },
843
+ {
844
+ "epoch": 0.5384615384615384,
845
+ "eval_loss": 0.9441266059875488,
846
+ "eval_runtime": 20.933,
847
+ "eval_samples_per_second": 9.554,
848
+ "eval_steps_per_second": 2.389,
849
+ "step": 560
850
+ },
851
+ {
852
+ "epoch": 0.5432692307692307,
853
+ "grad_norm": 2.562908465662044,
854
+ "learning_rate": 1.3595484476893454e-06,
855
+ "loss": 0.9229,
856
+ "step": 565
857
+ },
858
+ {
859
+ "epoch": 0.5480769230769231,
860
+ "grad_norm": 2.2982897576935724,
861
+ "learning_rate": 1.3371070145391023e-06,
862
+ "loss": 0.8806,
863
+ "step": 570
864
+ },
865
+ {
866
+ "epoch": 0.5528846153846154,
867
+ "grad_norm": 4.029788762639043,
868
+ "learning_rate": 1.3150208489197545e-06,
869
+ "loss": 0.7314,
870
+ "step": 575
871
+ },
872
+ {
873
+ "epoch": 0.5576923076923077,
874
+ "grad_norm": 3.4816155172912575,
875
+ "learning_rate": 1.2932880560396128e-06,
876
+ "loss": 0.819,
877
+ "step": 580
878
+ },
879
+ {
880
+ "epoch": 0.5625,
881
+ "grad_norm": 3.8108295243391868,
882
+ "learning_rate": 1.2719067308919584e-06,
883
+ "loss": 0.7222,
884
+ "step": 585
885
+ },
886
+ {
887
+ "epoch": 0.5673076923076923,
888
+ "grad_norm": 2.7857292629014183,
889
+ "learning_rate": 1.2508749580880287e-06,
890
+ "loss": 0.8022,
891
+ "step": 590
892
+ },
893
+ {
894
+ "epoch": 0.5721153846153846,
895
+ "grad_norm": 3.6021354748640677,
896
+ "learning_rate": 1.2301908116853925e-06,
897
+ "loss": 0.884,
898
+ "step": 595
899
+ },
900
+ {
901
+ "epoch": 0.5769230769230769,
902
+ "grad_norm": 3.135380180508478,
903
+ "learning_rate": 1.2098523550115558e-06,
904
+ "loss": 1.0023,
905
+ "step": 600
906
+ },
907
+ {
908
+ "epoch": 0.5817307692307693,
909
+ "grad_norm": 3.3653027564726035,
910
+ "learning_rate": 1.189857640482588e-06,
911
+ "loss": 0.9518,
912
+ "step": 605
913
+ },
914
+ {
915
+ "epoch": 0.5865384615384616,
916
+ "grad_norm": 2.459430693726985,
917
+ "learning_rate": 1.170204709416585e-06,
918
+ "loss": 0.8211,
919
+ "step": 610
920
+ },
921
+ {
922
+ "epoch": 0.5913461538461539,
923
+ "grad_norm": 5.022938552667774,
924
+ "learning_rate": 1.1508915918417567e-06,
925
+ "loss": 0.7398,
926
+ "step": 615
927
+ },
928
+ {
929
+ "epoch": 0.5961538461538461,
930
+ "grad_norm": 3.8724856541183357,
931
+ "learning_rate": 1.1319163062989139e-06,
932
+ "loss": 0.941,
933
+ "step": 620
934
+ },
935
+ {
936
+ "epoch": 0.6009615384615384,
937
+ "grad_norm": 3.1280693366860963,
938
+ "learning_rate": 1.1132768596381337e-06,
939
+ "loss": 0.815,
940
+ "step": 625
941
+ },
942
+ {
943
+ "epoch": 0.6057692307692307,
944
+ "grad_norm": 2.8201015243807284,
945
+ "learning_rate": 1.0949712468093497e-06,
946
+ "loss": 0.8991,
947
+ "step": 630
948
+ },
949
+ {
950
+ "epoch": 0.6105769230769231,
951
+ "grad_norm": 3.32788176588362,
952
+ "learning_rate": 1.076997450646619e-06,
953
+ "loss": 0.9282,
954
+ "step": 635
955
+ },
956
+ {
957
+ "epoch": 0.6153846153846154,
958
+ "grad_norm": 3.9582374514755134,
959
+ "learning_rate": 1.0593534416457847e-06,
960
+ "loss": 0.8221,
961
+ "step": 640
962
+ },
963
+ {
964
+ "epoch": 0.6153846153846154,
965
+ "eval_loss": 0.9404194355010986,
966
+ "eval_runtime": 21.0496,
967
+ "eval_samples_per_second": 9.501,
968
+ "eval_steps_per_second": 2.375,
969
+ "step": 640
970
+ },
971
+ {
972
+ "epoch": 0.6201923076923077,
973
+ "grad_norm": 2.5869189332376004,
974
+ "learning_rate": 1.0420371777352623e-06,
975
+ "loss": 0.8804,
976
+ "step": 645
977
+ },
978
+ {
979
+ "epoch": 0.625,
980
+ "grad_norm": 2.53500848922609,
981
+ "learning_rate": 1.0250466040396306e-06,
982
+ "loss": 0.7947,
983
+ "step": 650
984
+ },
985
+ {
986
+ "epoch": 0.6298076923076923,
987
+ "grad_norm": 3.07037325829785,
988
+ "learning_rate": 1.0083796526357243e-06,
989
+ "loss": 0.8485,
990
+ "step": 655
991
+ },
992
+ {
993
+ "epoch": 0.6346153846153846,
994
+ "grad_norm": 2.5949762709128814,
995
+ "learning_rate": 9.920342423008766e-07,
996
+ "loss": 0.7737,
997
+ "step": 660
998
+ },
999
+ {
1000
+ "epoch": 0.6394230769230769,
1001
+ "grad_norm": 3.723350500191604,
1002
+ "learning_rate": 9.760082782529624e-07,
1003
+ "loss": 0.8044,
1004
+ "step": 665
1005
+ },
1006
+ {
1007
+ "epoch": 0.6442307692307693,
1008
+ "grad_norm": 2.91223481306706,
1009
+ "learning_rate": 9.602996518818617e-07,
1010
+ "loss": 0.8059,
1011
+ "step": 670
1012
+ },
1013
+ {
1014
+ "epoch": 0.6490384615384616,
1015
+ "grad_norm": 3.228159750161236,
1016
+ "learning_rate": 9.449062404719376e-07,
1017
+ "loss": 0.9736,
1018
+ "step": 675
1019
+ },
1020
+ {
1021
+ "epoch": 0.6538461538461539,
1022
+ "grad_norm": 4.2304614726707594,
1023
+ "learning_rate": 9.298259069151074e-07,
1024
+ "loss": 0.8253,
1025
+ "step": 680
1026
+ },
1027
+ {
1028
+ "epoch": 0.6586538461538461,
1029
+ "grad_norm": 3.253581255940029,
1030
+ "learning_rate": 9.15056499414049e-07,
1031
+ "loss": 1.0807,
1032
+ "step": 685
1033
+ },
1034
+ {
1035
+ "epoch": 0.6634615384615384,
1036
+ "grad_norm": 4.2515171628124975,
1037
+ "learning_rate": 9.005958511750684e-07,
1038
+ "loss": 0.8206,
1039
+ "step": 690
1040
+ },
1041
+ {
1042
+ "epoch": 0.6682692307692307,
1043
+ "grad_norm": 2.7617275421854526,
1044
+ "learning_rate": 8.864417800901062e-07,
1045
+ "loss": 0.9496,
1046
+ "step": 695
1047
+ },
1048
+ {
1049
+ "epoch": 0.6730769230769231,
1050
+ "grad_norm": 3.233107996911771,
1051
+ "learning_rate": 8.72592088407351e-07,
1052
+ "loss": 0.9023,
1053
+ "step": 700
1054
+ },
1055
+ {
1056
+ "epoch": 0.6778846153846154,
1057
+ "grad_norm": 3.1204863795886184,
1058
+ "learning_rate": 8.590445623898662e-07,
1059
+ "loss": 0.869,
1060
+ "step": 705
1061
+ },
1062
+ {
1063
+ "epoch": 0.6826923076923077,
1064
+ "grad_norm": 2.5285063680240234,
1065
+ "learning_rate": 8.457969719616223e-07,
1066
+ "loss": 0.9186,
1067
+ "step": 710
1068
+ },
1069
+ {
1070
+ "epoch": 0.6875,
1071
+ "grad_norm": 3.0506459039436336,
1072
+ "learning_rate": 8.32847070340265e-07,
1073
+ "loss": 0.9203,
1074
+ "step": 715
1075
+ },
1076
+ {
1077
+ "epoch": 0.6923076923076923,
1078
+ "grad_norm": 3.7957636063897318,
1079
+ "learning_rate": 8.201925936559198e-07,
1080
+ "loss": 0.9417,
1081
+ "step": 720
1082
+ },
1083
+ {
1084
+ "epoch": 0.6923076923076923,
1085
+ "eval_loss": 0.9345305562019348,
1086
+ "eval_runtime": 21.1147,
1087
+ "eval_samples_per_second": 9.472,
1088
+ "eval_steps_per_second": 2.368,
1089
+ "step": 720
1090
+ },
1091
+ {
1092
+ "epoch": 0.6971153846153846,
1093
+ "grad_norm": 3.3254122602539624,
1094
+ "learning_rate": 8.078312605552745e-07,
1095
+ "loss": 0.9107,
1096
+ "step": 725
1097
+ },
1098
+ {
1099
+ "epoch": 0.7019230769230769,
1100
+ "grad_norm": 2.8068324192286487,
1101
+ "learning_rate": 7.957607717901299e-07,
1102
+ "loss": 0.9438,
1103
+ "step": 730
1104
+ },
1105
+ {
1106
+ "epoch": 0.7067307692307693,
1107
+ "grad_norm": 3.498836942130792,
1108
+ "learning_rate": 7.839788097895564e-07,
1109
+ "loss": 0.8693,
1110
+ "step": 735
1111
+ },
1112
+ {
1113
+ "epoch": 0.7115384615384616,
1114
+ "grad_norm": 2.5787803338017885,
1115
+ "learning_rate": 7.72483038214722e-07,
1116
+ "loss": 0.896,
1117
+ "step": 740
1118
+ },
1119
+ {
1120
+ "epoch": 0.7163461538461539,
1121
+ "grad_norm": 3.67630240687256,
1122
+ "learning_rate": 7.612711014953991e-07,
1123
+ "loss": 0.8243,
1124
+ "step": 745
1125
+ },
1126
+ {
1127
+ "epoch": 0.7211538461538461,
1128
+ "grad_norm": 2.4521374343388125,
1129
+ "learning_rate": 7.503406243470673e-07,
1130
+ "loss": 1.0063,
1131
+ "step": 750
1132
+ },
1133
+ {
1134
+ "epoch": 0.7259615384615384,
1135
+ "grad_norm": 2.6536830050201536,
1136
+ "learning_rate": 7.396892112674676e-07,
1137
+ "loss": 0.8133,
1138
+ "step": 755
1139
+ },
1140
+ {
1141
+ "epoch": 0.7307692307692307,
1142
+ "grad_norm": 3.057951252038446,
1143
+ "learning_rate": 7.293144460113513e-07,
1144
+ "loss": 0.8753,
1145
+ "step": 760
1146
+ },
1147
+ {
1148
+ "epoch": 0.7355769230769231,
1149
+ "grad_norm": 2.3939129798326815,
1150
+ "learning_rate": 7.192138910420856e-07,
1151
+ "loss": 0.8277,
1152
+ "step": 765
1153
+ },
1154
+ {
1155
+ "epoch": 0.7403846153846154,
1156
+ "grad_norm": 2.8809002810189233,
1157
+ "learning_rate": 7.093850869586572e-07,
1158
+ "loss": 0.8746,
1159
+ "step": 770
1160
+ },
1161
+ {
1162
+ "epoch": 0.7451923076923077,
1163
+ "grad_norm": 3.272891692948664,
1164
+ "learning_rate": 6.998255518965055e-07,
1165
+ "loss": 0.8711,
1166
+ "step": 775
1167
+ },
1168
+ {
1169
+ "epoch": 0.75,
1170
+ "grad_norm": 3.1649449172099073,
1171
+ "learning_rate": 6.905327809004765e-07,
1172
+ "loss": 0.8073,
1173
+ "step": 780
1174
+ },
1175
+ {
1176
+ "epoch": 0.7548076923076923,
1177
+ "grad_norm": 2.862835029692555,
1178
+ "learning_rate": 6.815042452680482e-07,
1179
+ "loss": 0.852,
1180
+ "step": 785
1181
+ },
1182
+ {
1183
+ "epoch": 0.7596153846153846,
1184
+ "grad_norm": 4.777839902626332,
1185
+ "learning_rate": 6.727373918608166e-07,
1186
+ "loss": 0.7941,
1187
+ "step": 790
1188
+ },
1189
+ {
1190
+ "epoch": 0.7644230769230769,
1191
+ "grad_norm": 3.4663518671110403,
1192
+ "learning_rate": 6.642296423820508e-07,
1193
+ "loss": 0.8553,
1194
+ "step": 795
1195
+ },
1196
+ {
1197
+ "epoch": 0.7692307692307693,
1198
+ "grad_norm": 3.062550953679388,
1199
+ "learning_rate": 6.559783926179307e-07,
1200
+ "loss": 0.9623,
1201
+ "step": 800
1202
+ },
1203
+ {
1204
+ "epoch": 0.7692307692307693,
1205
+ "eval_loss": 0.9317355155944824,
1206
+ "eval_runtime": 21.1215,
1207
+ "eval_samples_per_second": 9.469,
1208
+ "eval_steps_per_second": 2.367,
1209
+ "step": 800
1210
+ },
1211
+ {
1212
+ "epoch": 0.7740384615384616,
1213
+ "grad_norm": 2.9850983787230145,
1214
+ "learning_rate": 6.479810116398562e-07,
1215
+ "loss": 0.9048,
1216
+ "step": 805
1217
+ },
1218
+ {
1219
+ "epoch": 0.7788461538461539,
1220
+ "grad_norm": 2.5686622431209387,
1221
+ "learning_rate": 6.40234840964976e-07,
1222
+ "loss": 0.7535,
1223
+ "step": 810
1224
+ },
1225
+ {
1226
+ "epoch": 0.7836538461538461,
1227
+ "grad_norm": 2.8469066270016894,
1228
+ "learning_rate": 6.327371936718024e-07,
1229
+ "loss": 0.8606,
1230
+ "step": 815
1231
+ },
1232
+ {
1233
+ "epoch": 0.7884615384615384,
1234
+ "grad_norm": 3.567677645668133,
1235
+ "learning_rate": 6.254853534674779e-07,
1236
+ "loss": 0.8133,
1237
+ "step": 820
1238
+ },
1239
+ {
1240
+ "epoch": 0.7932692307692307,
1241
+ "grad_norm": 2.331177876625003,
1242
+ "learning_rate": 6.184765737029068e-07,
1243
+ "loss": 0.921,
1244
+ "step": 825
1245
+ },
1246
+ {
1247
+ "epoch": 0.7980769230769231,
1248
+ "grad_norm": 2.684486602009453,
1249
+ "learning_rate": 6.117080763315794e-07,
1250
+ "loss": 0.8378,
1251
+ "step": 830
1252
+ },
1253
+ {
1254
+ "epoch": 0.8028846153846154,
1255
+ "grad_norm": 2.7951045757499546,
1256
+ "learning_rate": 6.051770508074766e-07,
1257
+ "loss": 0.7412,
1258
+ "step": 835
1259
+ },
1260
+ {
1261
+ "epoch": 0.8076923076923077,
1262
+ "grad_norm": 4.34395271902391,
1263
+ "learning_rate": 5.98880652916942e-07,
1264
+ "loss": 0.8488,
1265
+ "step": 840
1266
+ },
1267
+ {
1268
+ "epoch": 0.8125,
1269
+ "grad_norm": 2.4901987068339175,
1270
+ "learning_rate": 5.928160035388477e-07,
1271
+ "loss": 0.7888,
1272
+ "step": 845
1273
+ },
1274
+ {
1275
+ "epoch": 0.8173076923076923,
1276
+ "grad_norm": 3.410681331565254,
1277
+ "learning_rate": 5.869801873267336e-07,
1278
+ "loss": 0.9896,
1279
+ "step": 850
1280
+ },
1281
+ {
1282
+ "epoch": 0.8221153846153846,
1283
+ "grad_norm": 3.0373771991309715,
1284
+ "learning_rate": 5.813702513058679e-07,
1285
+ "loss": 0.7731,
1286
+ "step": 855
1287
+ },
1288
+ {
1289
+ "epoch": 0.8269230769230769,
1290
+ "grad_norm": 2.6095155256301656,
1291
+ "learning_rate": 5.759832033773325e-07,
1292
+ "loss": 0.9015,
1293
+ "step": 860
1294
+ },
1295
+ {
1296
+ "epoch": 0.8317307692307693,
1297
+ "grad_norm": 3.499761379842187,
1298
+ "learning_rate": 5.708160107202719e-07,
1299
+ "loss": 0.8423,
1300
+ "step": 865
1301
+ },
1302
+ {
1303
+ "epoch": 0.8365384615384616,
1304
+ "grad_norm": 2.63663041754238,
1305
+ "learning_rate": 5.658655980823239e-07,
1306
+ "loss": 0.8807,
1307
+ "step": 870
1308
+ },
1309
+ {
1310
+ "epoch": 0.8413461538461539,
1311
+ "grad_norm": 3.943874822020016,
1312
+ "learning_rate": 5.611288459469594e-07,
1313
+ "loss": 0.8609,
1314
+ "step": 875
1315
+ },
1316
+ {
1317
+ "epoch": 0.8461538461538461,
1318
+ "grad_norm": 2.9004043511306525,
1319
+ "learning_rate": 5.566025885649524e-07,
1320
+ "loss": 0.9654,
1321
+ "step": 880
1322
+ },
1323
+ {
1324
+ "epoch": 0.8461538461538461,
1325
+ "eval_loss": 0.9302033185958862,
1326
+ "eval_runtime": 21.0263,
1327
+ "eval_samples_per_second": 9.512,
1328
+ "eval_steps_per_second": 2.378,
1329
+ "step": 880
1330
+ },
1331
+ {
1332
+ "epoch": 0.8509615384615384,
1333
+ "grad_norm": 3.182299494802371,
1334
+ "learning_rate": 5.522836118354419e-07,
1335
+ "loss": 0.7406,
1336
+ "step": 885
1337
+ },
1338
+ {
1339
+ "epoch": 0.8557692307692307,
1340
+ "grad_norm": 3.1170335107274214,
1341
+ "learning_rate": 5.481686510199858e-07,
1342
+ "loss": 0.9893,
1343
+ "step": 890
1344
+ },
1345
+ {
1346
+ "epoch": 0.8605769230769231,
1347
+ "grad_norm": 2.437332494806209,
1348
+ "learning_rate": 5.442543882705713e-07,
1349
+ "loss": 0.9432,
1350
+ "step": 895
1351
+ },
1352
+ {
1353
+ "epoch": 0.8653846153846154,
1354
+ "grad_norm": 3.248411155382253,
1355
+ "learning_rate": 5.405374499496658e-07,
1356
+ "loss": 0.8199,
1357
+ "step": 900
1358
+ },
1359
+ {
1360
+ "epoch": 0.8701923076923077,
1361
+ "grad_norm": 3.699605668699813,
1362
+ "learning_rate": 5.370144037169503e-07,
1363
+ "loss": 0.8742,
1364
+ "step": 905
1365
+ },
1366
+ {
1367
+ "epoch": 0.875,
1368
+ "grad_norm": 4.418113021858762,
1369
+ "learning_rate": 5.336817553532644e-07,
1370
+ "loss": 0.8431,
1371
+ "step": 910
1372
+ },
1373
+ {
1374
+ "epoch": 0.8798076923076923,
1375
+ "grad_norm": 2.3988015404279874,
1376
+ "learning_rate": 5.305359452873153e-07,
1377
+ "loss": 0.8947,
1378
+ "step": 915
1379
+ },
1380
+ {
1381
+ "epoch": 0.8846153846153846,
1382
+ "grad_norm": 3.0267726009783766,
1383
+ "learning_rate": 5.275733447846792e-07,
1384
+ "loss": 0.7263,
1385
+ "step": 920
1386
+ },
1387
+ {
1388
+ "epoch": 0.8894230769230769,
1389
+ "grad_norm": 3.722228079235539,
1390
+ "learning_rate": 5.247902517512378e-07,
1391
+ "loss": 0.8365,
1392
+ "step": 925
1393
+ },
1394
+ {
1395
+ "epoch": 0.8942307692307693,
1396
+ "grad_norm": 2.603232021464912,
1397
+ "learning_rate": 5.221828860941111e-07,
1398
+ "loss": 1.0223,
1399
+ "step": 930
1400
+ },
1401
+ {
1402
+ "epoch": 0.8990384615384616,
1403
+ "grad_norm": 2.784717139792509,
1404
+ "learning_rate": 5.197473845718411e-07,
1405
+ "loss": 0.8666,
1406
+ "step": 935
1407
+ },
1408
+ {
1409
+ "epoch": 0.9038461538461539,
1410
+ "grad_norm": 2.864173244146164,
1411
+ "learning_rate": 5.174797950514308e-07,
1412
+ "loss": 0.7097,
1413
+ "step": 940
1414
+ },
1415
+ {
1416
+ "epoch": 0.9086538461538461,
1417
+ "grad_norm": 3.1016453769012395,
1418
+ "learning_rate": 5.153760700719024e-07,
1419
+ "loss": 0.9475,
1420
+ "step": 945
1421
+ },
1422
+ {
1423
+ "epoch": 0.9134615384615384,
1424
+ "grad_norm": 3.5038468947729973,
1425
+ "learning_rate": 5.13432059591097e-07,
1426
+ "loss": 0.8123,
1427
+ "step": 950
1428
+ },
1429
+ {
1430
+ "epoch": 0.9182692307692307,
1431
+ "grad_norm": 3.2927805818210407,
1432
+ "learning_rate": 5.116435027627297e-07,
1433
+ "loss": 0.8134,
1434
+ "step": 955
1435
+ },
1436
+ {
1437
+ "epoch": 0.9230769230769231,
1438
+ "grad_norm": 2.3328148005143747,
1439
+ "learning_rate": 5.100060185517474e-07,
1440
+ "loss": 0.9169,
1441
+ "step": 960
1442
+ },
1443
+ {
1444
+ "epoch": 0.9230769230769231,
1445
+ "eval_loss": 0.928638756275177,
1446
+ "eval_runtime": 21.0064,
1447
+ "eval_samples_per_second": 9.521,
1448
+ "eval_steps_per_second": 2.38,
1449
+ "step": 960
1450
+ },
1451
+ {
1452
+ "epoch": 0.9278846153846154,
1453
+ "grad_norm": 3.644838748812858,
1454
+ "learning_rate": 5.085150949442101e-07,
1455
+ "loss": 0.7718,
1456
+ "step": 965
1457
+ },
1458
+ {
1459
+ "epoch": 0.9326923076923077,
1460
+ "grad_norm": 2.7559502909140505,
1461
+ "learning_rate": 5.071660764378547e-07,
1462
+ "loss": 0.9096,
1463
+ "step": 970
1464
+ },
1465
+ {
1466
+ "epoch": 0.9375,
1467
+ "grad_norm": 2.5970949524935363,
1468
+ "learning_rate": 5.059541494031398e-07,
1469
+ "loss": 0.8835,
1470
+ "step": 975
1471
+ },
1472
+ {
1473
+ "epoch": 0.9423076923076923,
1474
+ "grad_norm": 2.1523312550723066,
1475
+ "learning_rate": 5.048743247693103e-07,
1476
+ "loss": 0.8909,
1477
+ "step": 980
1478
+ },
1479
+ {
1480
+ "epoch": 0.9471153846153846,
1481
+ "grad_norm": 5.2539613787039885,
1482
+ "learning_rate": 5.039214172958587e-07,
1483
+ "loss": 0.8688,
1484
+ "step": 985
1485
+ },
1486
+ {
1487
+ "epoch": 0.9519230769230769,
1488
+ "grad_norm": 2.9606045980250837,
1489
+ "learning_rate": 5.030900204036544e-07,
1490
+ "loss": 0.8714,
1491
+ "step": 990
1492
+ },
1493
+ {
1494
+ "epoch": 0.9567307692307693,
1495
+ "grad_norm": 2.939313716550038,
1496
+ "learning_rate": 5.023744751055416e-07,
1497
+ "loss": 0.9248,
1498
+ "step": 995
1499
+ },
1500
+ {
1501
+ "epoch": 0.9615384615384616,
1502
+ "grad_norm": 2.7776091933130473,
1503
+ "learning_rate": 5.017688308926548e-07,
1504
+ "loss": 0.8965,
1505
+ "step": 1000
1506
+ },
1507
+ {
1508
+ "epoch": 0.9663461538461539,
1509
+ "grad_norm": 3.3105407766408685,
1510
+ "learning_rate": 5.012667953109271e-07,
1511
+ "loss": 0.8606,
1512
+ "step": 1005
1513
+ },
1514
+ {
1515
+ "epoch": 0.9711538461538461,
1516
+ "grad_norm": 7.289088245652649,
1517
+ "learning_rate": 5.008616670245212e-07,
1518
+ "loss": 0.8847,
1519
+ "step": 1010
1520
+ },
1521
+ {
1522
+ "epoch": 0.9759615384615384,
1523
+ "grad_norm": 4.342531181739036,
1524
+ "learning_rate": 5.005462435953572e-07,
1525
+ "loss": 0.7237,
1526
+ "step": 1015
1527
+ },
1528
+ {
1529
+ "epoch": 0.9807692307692307,
1530
+ "grad_norm": 3.3798170801004304,
1531
+ "learning_rate": 5.003126880797421e-07,
1532
+ "loss": 0.9875,
1533
+ "step": 1020
1534
+ },
1535
+ {
1536
+ "epoch": 0.9855769230769231,
1537
+ "grad_norm": 2.413281341822416,
1538
+ "learning_rate": 5.00152322649041e-07,
1539
+ "loss": 0.8558,
1540
+ "step": 1025
1541
+ },
1542
+ {
1543
+ "epoch": 0.9903846153846154,
1544
+ "grad_norm": 3.479845931889368,
1545
+ "learning_rate": 5.000552759653955e-07,
1546
+ "loss": 0.6462,
1547
+ "step": 1030
1548
+ },
1549
+ {
1550
+ "epoch": 0.9951923076923077,
1551
+ "grad_norm": 3.6411495658522273,
1552
+ "learning_rate": 5.000097715024919e-07,
1553
+ "loss": 0.7703,
1554
+ "step": 1035
1555
+ },
1556
+ {
1557
+ "epoch": 1.0,
1558
+ "grad_norm": 2.04941647733406,
1559
+ "learning_rate": 5e-07,
1560
+ "loss": 0.9005,
1561
+ "step": 1040
1562
+ },
1563
+ {
1564
+ "epoch": 1.0,
1565
+ "eval_loss": 0.9287646412849426,
1566
+ "eval_runtime": 21.1549,
1567
+ "eval_samples_per_second": 9.454,
1568
+ "eval_steps_per_second": 2.364,
1569
+ "step": 1040
1570
+ },
1571
+ {
1572
+ "epoch": 1.0,
1573
+ "step": 1040,
1574
+ "total_flos": 8.200255844856627e+16,
1575
+ "train_loss": 0.8882445046534905,
1576
+ "train_runtime": 9157.2611,
1577
+ "train_samples_per_second": 3.18,
1578
+ "train_steps_per_second": 0.114
1579
+ }
1580
+ ],
1581
+ "logging_steps": 5,
1582
+ "max_steps": 1040,
1583
+ "num_input_tokens_seen": 0,
1584
+ "num_train_epochs": 1,
1585
+ "save_steps": 1040,
1586
+ "stateful_callbacks": {
1587
+ "TrainerControl": {
1588
+ "args": {
1589
+ "should_epoch_stop": false,
1590
+ "should_evaluate": false,
1591
+ "should_log": false,
1592
+ "should_save": true,
1593
+ "should_training_stop": true
1594
+ },
1595
+ "attributes": {}
1596
+ }
1597
+ },
1598
+ "total_flos": 8.200255844856627e+16,
1599
+ "train_batch_size": 2,
1600
+ "trial_name": null,
1601
+ "trial_params": null
1602
+ }