borakaragul commited on
Commit
3585b8b
1 Parent(s): ef7f9e5

Delete checkpoint-800

Browse files
checkpoint-800/README.md DELETED
@@ -1,202 +0,0 @@
1
- ---
2
- base_model: meta-llama/Llama-2-7b-hf
3
- library_name: peft
4
- ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
-
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
200
- ### Framework versions
201
-
202
- - PEFT 0.12.1.dev0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-800/adapter_config.json DELETED
@@ -1,29 +0,0 @@
1
- {
2
- "alpha_pattern": {},
3
- "auto_mapping": null,
4
- "base_model_name_or_path": "meta-llama/Llama-2-7b-hf",
5
- "bias": "none",
6
- "fan_in_fan_out": false,
7
- "inference_mode": true,
8
- "init_lora_weights": true,
9
- "layer_replication": null,
10
- "layers_pattern": null,
11
- "layers_to_transform": null,
12
- "loftq_config": {},
13
- "lora_alpha": 32,
14
- "lora_dropout": 0.05,
15
- "megatron_config": null,
16
- "megatron_core": "megatron.core",
17
- "modules_to_save": null,
18
- "peft_type": "LORA",
19
- "r": 32,
20
- "rank_pattern": {},
21
- "revision": null,
22
- "target_modules": [
23
- "q_proj",
24
- "v_proj"
25
- ],
26
- "task_type": "CAUSAL_LM",
27
- "use_dora": false,
28
- "use_rslora": false
29
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-800/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:58596161b97027bf077523720d5db60376c1cf9d64b96d2e47d5faafc48680bf
3
- size 67126104
 
 
 
 
checkpoint-800/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:08bc6daf511a2e67f0fdd99e95a84fc9638b2cf6ef239bd148cdf657f85c40f6
3
- size 134325882
 
 
 
 
checkpoint-800/rng_state.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:19b755ec2f18159d02c544c3c9708aefbcd6221fcc0ba746af3405b680c85e1d
3
- size 14244
 
 
 
 
checkpoint-800/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1926f1cdeda55ae5f3dce4298e3b979992324dc2f9793d32a19c0cdc5f23699e
3
- size 1064
 
 
 
 
checkpoint-800/trainer_state.json DELETED
@@ -1,657 +0,0 @@
1
- {
2
- "best_metric": null,
3
- "best_model_checkpoint": null,
4
- "epoch": 0.9603841536614646,
5
- "eval_steps": 100,
6
- "global_step": 800,
7
- "is_hyper_param_search": false,
8
- "is_local_process_zero": true,
9
- "is_world_process_zero": true,
10
- "log_history": [
11
- {
12
- "epoch": 0.012004801920768308,
13
- "grad_norm": 0.3538109064102173,
14
- "learning_rate": 9.879951980792317e-05,
15
- "loss": 2.3075,
16
- "step": 10
17
- },
18
- {
19
- "epoch": 0.024009603841536616,
20
- "grad_norm": 0.3431814908981323,
21
- "learning_rate": 9.759903961584634e-05,
22
- "loss": 2.1468,
23
- "step": 20
24
- },
25
- {
26
- "epoch": 0.03601440576230492,
27
- "grad_norm": 0.6265680193901062,
28
- "learning_rate": 9.639855942376951e-05,
29
- "loss": 2.0353,
30
- "step": 30
31
- },
32
- {
33
- "epoch": 0.04801920768307323,
34
- "grad_norm": 0.3244292736053467,
35
- "learning_rate": 9.519807923169268e-05,
36
- "loss": 2.0219,
37
- "step": 40
38
- },
39
- {
40
- "epoch": 0.060024009603841535,
41
- "grad_norm": 0.6210302114486694,
42
- "learning_rate": 9.399759903961585e-05,
43
- "loss": 1.9809,
44
- "step": 50
45
- },
46
- {
47
- "epoch": 0.07202881152460984,
48
- "grad_norm": 0.4695661962032318,
49
- "learning_rate": 9.279711884753903e-05,
50
- "loss": 1.9376,
51
- "step": 60
52
- },
53
- {
54
- "epoch": 0.08403361344537816,
55
- "grad_norm": 0.27379557490348816,
56
- "learning_rate": 9.159663865546218e-05,
57
- "loss": 1.923,
58
- "step": 70
59
- },
60
- {
61
- "epoch": 0.09603841536614646,
62
- "grad_norm": 0.4497829079627991,
63
- "learning_rate": 9.039615846338536e-05,
64
- "loss": 1.9875,
65
- "step": 80
66
- },
67
- {
68
- "epoch": 0.10804321728691477,
69
- "grad_norm": 0.3675350844860077,
70
- "learning_rate": 8.919567827130852e-05,
71
- "loss": 1.953,
72
- "step": 90
73
- },
74
- {
75
- "epoch": 0.12004801920768307,
76
- "grad_norm": 0.293821781873703,
77
- "learning_rate": 8.79951980792317e-05,
78
- "loss": 1.8549,
79
- "step": 100
80
- },
81
- {
82
- "epoch": 0.12004801920768307,
83
- "eval_loss": 1.889587640762329,
84
- "eval_runtime": 39.6181,
85
- "eval_samples_per_second": 25.241,
86
- "eval_steps_per_second": 4.215,
87
- "step": 100
88
- },
89
- {
90
- "epoch": 0.13205282112845138,
91
- "grad_norm": 0.25892049074172974,
92
- "learning_rate": 8.679471788715487e-05,
93
- "loss": 1.8372,
94
- "step": 110
95
- },
96
- {
97
- "epoch": 0.14405762304921968,
98
- "grad_norm": 0.33091431856155396,
99
- "learning_rate": 8.559423769507804e-05,
100
- "loss": 1.9107,
101
- "step": 120
102
- },
103
- {
104
- "epoch": 0.15606242496998798,
105
- "grad_norm": 0.3052417039871216,
106
- "learning_rate": 8.43937575030012e-05,
107
- "loss": 1.8123,
108
- "step": 130
109
- },
110
- {
111
- "epoch": 0.16806722689075632,
112
- "grad_norm": 0.3017849028110504,
113
- "learning_rate": 8.319327731092437e-05,
114
- "loss": 1.8966,
115
- "step": 140
116
- },
117
- {
118
- "epoch": 0.18007202881152462,
119
- "grad_norm": 0.28657111525535583,
120
- "learning_rate": 8.199279711884754e-05,
121
- "loss": 1.8395,
122
- "step": 150
123
- },
124
- {
125
- "epoch": 0.19207683073229292,
126
- "grad_norm": 0.3324660062789917,
127
- "learning_rate": 8.079231692677071e-05,
128
- "loss": 1.8841,
129
- "step": 160
130
- },
131
- {
132
- "epoch": 0.20408163265306123,
133
- "grad_norm": 0.28194233775138855,
134
- "learning_rate": 7.959183673469388e-05,
135
- "loss": 1.8627,
136
- "step": 170
137
- },
138
- {
139
- "epoch": 0.21608643457382953,
140
- "grad_norm": 0.45277833938598633,
141
- "learning_rate": 7.839135654261706e-05,
142
- "loss": 1.8989,
143
- "step": 180
144
- },
145
- {
146
- "epoch": 0.22809123649459784,
147
- "grad_norm": 0.3656202256679535,
148
- "learning_rate": 7.719087635054022e-05,
149
- "loss": 1.911,
150
- "step": 190
151
- },
152
- {
153
- "epoch": 0.24009603841536614,
154
- "grad_norm": 0.33644187450408936,
155
- "learning_rate": 7.599039615846338e-05,
156
- "loss": 1.7805,
157
- "step": 200
158
- },
159
- {
160
- "epoch": 0.24009603841536614,
161
- "eval_loss": 1.8699440956115723,
162
- "eval_runtime": 39.0269,
163
- "eval_samples_per_second": 25.623,
164
- "eval_steps_per_second": 4.279,
165
- "step": 200
166
- },
167
- {
168
- "epoch": 0.25210084033613445,
169
- "grad_norm": 0.24649439752101898,
170
- "learning_rate": 7.478991596638657e-05,
171
- "loss": 1.841,
172
- "step": 210
173
- },
174
- {
175
- "epoch": 0.26410564225690275,
176
- "grad_norm": 0.30980348587036133,
177
- "learning_rate": 7.358943577430972e-05,
178
- "loss": 1.9065,
179
- "step": 220
180
- },
181
- {
182
- "epoch": 0.27611044417767105,
183
- "grad_norm": 0.2731610834598541,
184
- "learning_rate": 7.23889555822329e-05,
185
- "loss": 1.8713,
186
- "step": 230
187
- },
188
- {
189
- "epoch": 0.28811524609843936,
190
- "grad_norm": 0.3211459517478943,
191
- "learning_rate": 7.118847539015606e-05,
192
- "loss": 1.8719,
193
- "step": 240
194
- },
195
- {
196
- "epoch": 0.30012004801920766,
197
- "grad_norm": 0.3356248736381531,
198
- "learning_rate": 6.998799519807924e-05,
199
- "loss": 1.8211,
200
- "step": 250
201
- },
202
- {
203
- "epoch": 0.31212484993997597,
204
- "grad_norm": 0.35724079608917236,
205
- "learning_rate": 6.878751500600241e-05,
206
- "loss": 1.8773,
207
- "step": 260
208
- },
209
- {
210
- "epoch": 0.3241296518607443,
211
- "grad_norm": 0.5718697905540466,
212
- "learning_rate": 6.758703481392558e-05,
213
- "loss": 1.8231,
214
- "step": 270
215
- },
216
- {
217
- "epoch": 0.33613445378151263,
218
- "grad_norm": 0.32961568236351013,
219
- "learning_rate": 6.638655462184874e-05,
220
- "loss": 1.9494,
221
- "step": 280
222
- },
223
- {
224
- "epoch": 0.34813925570228094,
225
- "grad_norm": 0.3693414628505707,
226
- "learning_rate": 6.518607442977191e-05,
227
- "loss": 1.835,
228
- "step": 290
229
- },
230
- {
231
- "epoch": 0.36014405762304924,
232
- "grad_norm": 0.4198884665966034,
233
- "learning_rate": 6.398559423769508e-05,
234
- "loss": 1.8722,
235
- "step": 300
236
- },
237
- {
238
- "epoch": 0.36014405762304924,
239
- "eval_loss": 1.8641259670257568,
240
- "eval_runtime": 39.2717,
241
- "eval_samples_per_second": 25.464,
242
- "eval_steps_per_second": 4.252,
243
- "step": 300
244
- },
245
- {
246
- "epoch": 0.37214885954381755,
247
- "grad_norm": 0.3070141077041626,
248
- "learning_rate": 6.278511404561825e-05,
249
- "loss": 1.8771,
250
- "step": 310
251
- },
252
- {
253
- "epoch": 0.38415366146458585,
254
- "grad_norm": 0.43982160091400146,
255
- "learning_rate": 6.158463385354142e-05,
256
- "loss": 1.8954,
257
- "step": 320
258
- },
259
- {
260
- "epoch": 0.39615846338535415,
261
- "grad_norm": 0.2642657458782196,
262
- "learning_rate": 6.038415366146459e-05,
263
- "loss": 1.8977,
264
- "step": 330
265
- },
266
- {
267
- "epoch": 0.40816326530612246,
268
- "grad_norm": 0.25595635175704956,
269
- "learning_rate": 5.918367346938776e-05,
270
- "loss": 1.8949,
271
- "step": 340
272
- },
273
- {
274
- "epoch": 0.42016806722689076,
275
- "grad_norm": 0.27742770314216614,
276
- "learning_rate": 5.7983193277310935e-05,
277
- "loss": 1.8223,
278
- "step": 350
279
- },
280
- {
281
- "epoch": 0.43217286914765907,
282
- "grad_norm": 0.2831798791885376,
283
- "learning_rate": 5.6782713085234096e-05,
284
- "loss": 1.8477,
285
- "step": 360
286
- },
287
- {
288
- "epoch": 0.44417767106842737,
289
- "grad_norm": 0.3077690005302429,
290
- "learning_rate": 5.558223289315727e-05,
291
- "loss": 1.9376,
292
- "step": 370
293
- },
294
- {
295
- "epoch": 0.4561824729891957,
296
- "grad_norm": 0.5014916658401489,
297
- "learning_rate": 5.438175270108043e-05,
298
- "loss": 1.9241,
299
- "step": 380
300
- },
301
- {
302
- "epoch": 0.468187274909964,
303
- "grad_norm": 0.7313567399978638,
304
- "learning_rate": 5.31812725090036e-05,
305
- "loss": 1.8995,
306
- "step": 390
307
- },
308
- {
309
- "epoch": 0.4801920768307323,
310
- "grad_norm": 0.26589518785476685,
311
- "learning_rate": 5.1980792316926776e-05,
312
- "loss": 1.95,
313
- "step": 400
314
- },
315
- {
316
- "epoch": 0.4801920768307323,
317
- "eval_loss": 1.8611316680908203,
318
- "eval_runtime": 39.4525,
319
- "eval_samples_per_second": 25.347,
320
- "eval_steps_per_second": 4.233,
321
- "step": 400
322
- },
323
- {
324
- "epoch": 0.4921968787515006,
325
- "grad_norm": 0.293312668800354,
326
- "learning_rate": 5.078031212484994e-05,
327
- "loss": 1.8592,
328
- "step": 410
329
- },
330
- {
331
- "epoch": 0.5042016806722689,
332
- "grad_norm": 0.35831350088119507,
333
- "learning_rate": 4.957983193277311e-05,
334
- "loss": 1.9295,
335
- "step": 420
336
- },
337
- {
338
- "epoch": 0.5162064825930373,
339
- "grad_norm": 0.33949533104896545,
340
- "learning_rate": 4.837935174069628e-05,
341
- "loss": 1.8581,
342
- "step": 430
343
- },
344
- {
345
- "epoch": 0.5282112845138055,
346
- "grad_norm": 0.5358735918998718,
347
- "learning_rate": 4.717887154861945e-05,
348
- "loss": 1.8824,
349
- "step": 440
350
- },
351
- {
352
- "epoch": 0.5402160864345739,
353
- "grad_norm": 0.3915724456310272,
354
- "learning_rate": 4.5978391356542624e-05,
355
- "loss": 1.8699,
356
- "step": 450
357
- },
358
- {
359
- "epoch": 0.5522208883553421,
360
- "grad_norm": 0.3336695730686188,
361
- "learning_rate": 4.477791116446579e-05,
362
- "loss": 1.9749,
363
- "step": 460
364
- },
365
- {
366
- "epoch": 0.5642256902761105,
367
- "grad_norm": 0.3887428641319275,
368
- "learning_rate": 4.3577430972388954e-05,
369
- "loss": 1.8134,
370
- "step": 470
371
- },
372
- {
373
- "epoch": 0.5762304921968787,
374
- "grad_norm": 0.26423540711402893,
375
- "learning_rate": 4.237695078031212e-05,
376
- "loss": 1.8495,
377
- "step": 480
378
- },
379
- {
380
- "epoch": 0.5882352941176471,
381
- "grad_norm": 0.30970582365989685,
382
- "learning_rate": 4.11764705882353e-05,
383
- "loss": 1.8386,
384
- "step": 490
385
- },
386
- {
387
- "epoch": 0.6002400960384153,
388
- "grad_norm": 0.38180047273635864,
389
- "learning_rate": 3.9975990396158466e-05,
390
- "loss": 1.8729,
391
- "step": 500
392
- },
393
- {
394
- "epoch": 0.6002400960384153,
395
- "eval_loss": 1.8590781688690186,
396
- "eval_runtime": 39.0438,
397
- "eval_samples_per_second": 25.612,
398
- "eval_steps_per_second": 4.277,
399
- "step": 500
400
- },
401
- {
402
- "epoch": 0.6122448979591837,
403
- "grad_norm": 0.4111509621143341,
404
- "learning_rate": 3.8775510204081634e-05,
405
- "loss": 1.833,
406
- "step": 510
407
- },
408
- {
409
- "epoch": 0.6242496998799519,
410
- "grad_norm": 0.515201210975647,
411
- "learning_rate": 3.75750300120048e-05,
412
- "loss": 1.8744,
413
- "step": 520
414
- },
415
- {
416
- "epoch": 0.6362545018007203,
417
- "grad_norm": 0.2900288701057434,
418
- "learning_rate": 3.637454981992797e-05,
419
- "loss": 1.8364,
420
- "step": 530
421
- },
422
- {
423
- "epoch": 0.6482593037214885,
424
- "grad_norm": 0.2963515520095825,
425
- "learning_rate": 3.517406962785114e-05,
426
- "loss": 1.8469,
427
- "step": 540
428
- },
429
- {
430
- "epoch": 0.6602641056422569,
431
- "grad_norm": 0.41035813093185425,
432
- "learning_rate": 3.3973589435774314e-05,
433
- "loss": 1.9474,
434
- "step": 550
435
- },
436
- {
437
- "epoch": 0.6722689075630253,
438
- "grad_norm": 0.246292382478714,
439
- "learning_rate": 3.277310924369748e-05,
440
- "loss": 1.8128,
441
- "step": 560
442
- },
443
- {
444
- "epoch": 0.6842737094837935,
445
- "grad_norm": 0.3911287188529968,
446
- "learning_rate": 3.157262905162065e-05,
447
- "loss": 1.8663,
448
- "step": 570
449
- },
450
- {
451
- "epoch": 0.6962785114045619,
452
- "grad_norm": 0.3035126328468323,
453
- "learning_rate": 3.037214885954382e-05,
454
- "loss": 1.8921,
455
- "step": 580
456
- },
457
- {
458
- "epoch": 0.7082833133253301,
459
- "grad_norm": 0.2977759540081024,
460
- "learning_rate": 2.917166866746699e-05,
461
- "loss": 1.8288,
462
- "step": 590
463
- },
464
- {
465
- "epoch": 0.7202881152460985,
466
- "grad_norm": 0.3059289753437042,
467
- "learning_rate": 2.797118847539016e-05,
468
- "loss": 1.8482,
469
- "step": 600
470
- },
471
- {
472
- "epoch": 0.7202881152460985,
473
- "eval_loss": 1.8568615913391113,
474
- "eval_runtime": 39.5808,
475
- "eval_samples_per_second": 25.265,
476
- "eval_steps_per_second": 4.219,
477
- "step": 600
478
- },
479
- {
480
- "epoch": 0.7322929171668667,
481
- "grad_norm": 0.36612552404403687,
482
- "learning_rate": 2.6770708283313327e-05,
483
- "loss": 1.8577,
484
- "step": 610
485
- },
486
- {
487
- "epoch": 0.7442977190876351,
488
- "grad_norm": 0.2968533933162689,
489
- "learning_rate": 2.5570228091236498e-05,
490
- "loss": 1.8295,
491
- "step": 620
492
- },
493
- {
494
- "epoch": 0.7563025210084033,
495
- "grad_norm": 0.40536123514175415,
496
- "learning_rate": 2.4369747899159663e-05,
497
- "loss": 1.9242,
498
- "step": 630
499
- },
500
- {
501
- "epoch": 0.7683073229291717,
502
- "grad_norm": 0.3592563271522522,
503
- "learning_rate": 2.3169267707082835e-05,
504
- "loss": 1.8677,
505
- "step": 640
506
- },
507
- {
508
- "epoch": 0.78031212484994,
509
- "grad_norm": 0.3828364908695221,
510
- "learning_rate": 2.1968787515006003e-05,
511
- "loss": 1.9055,
512
- "step": 650
513
- },
514
- {
515
- "epoch": 0.7923169267707083,
516
- "grad_norm": 0.26257479190826416,
517
- "learning_rate": 2.076830732292917e-05,
518
- "loss": 1.9383,
519
- "step": 660
520
- },
521
- {
522
- "epoch": 0.8043217286914766,
523
- "grad_norm": 0.3253800570964813,
524
- "learning_rate": 1.9567827130852343e-05,
525
- "loss": 1.8142,
526
- "step": 670
527
- },
528
- {
529
- "epoch": 0.8163265306122449,
530
- "grad_norm": 0.3129618167877197,
531
- "learning_rate": 1.836734693877551e-05,
532
- "loss": 1.8447,
533
- "step": 680
534
- },
535
- {
536
- "epoch": 0.8283313325330132,
537
- "grad_norm": 0.32753467559814453,
538
- "learning_rate": 1.7166866746698683e-05,
539
- "loss": 1.8617,
540
- "step": 690
541
- },
542
- {
543
- "epoch": 0.8403361344537815,
544
- "grad_norm": 0.28356197476387024,
545
- "learning_rate": 1.5966386554621848e-05,
546
- "loss": 1.768,
547
- "step": 700
548
- },
549
- {
550
- "epoch": 0.8403361344537815,
551
- "eval_loss": 1.8553369045257568,
552
- "eval_runtime": 39.3665,
553
- "eval_samples_per_second": 25.402,
554
- "eval_steps_per_second": 4.242,
555
- "step": 700
556
- },
557
- {
558
- "epoch": 0.8523409363745498,
559
- "grad_norm": 0.42871856689453125,
560
- "learning_rate": 1.4765906362545018e-05,
561
- "loss": 1.8756,
562
- "step": 710
563
- },
564
- {
565
- "epoch": 0.8643457382953181,
566
- "grad_norm": 0.32104918360710144,
567
- "learning_rate": 1.3565426170468188e-05,
568
- "loss": 1.8396,
569
- "step": 720
570
- },
571
- {
572
- "epoch": 0.8763505402160864,
573
- "grad_norm": 0.3462695777416229,
574
- "learning_rate": 1.2364945978391356e-05,
575
- "loss": 1.8499,
576
- "step": 730
577
- },
578
- {
579
- "epoch": 0.8883553421368547,
580
- "grad_norm": 0.38381311297416687,
581
- "learning_rate": 1.1164465786314526e-05,
582
- "loss": 1.8353,
583
- "step": 740
584
- },
585
- {
586
- "epoch": 0.9003601440576231,
587
- "grad_norm": 0.48733091354370117,
588
- "learning_rate": 9.963985594237696e-06,
589
- "loss": 1.8566,
590
- "step": 750
591
- },
592
- {
593
- "epoch": 0.9123649459783914,
594
- "grad_norm": 0.35725510120391846,
595
- "learning_rate": 8.763505402160866e-06,
596
- "loss": 1.9016,
597
- "step": 760
598
- },
599
- {
600
- "epoch": 0.9243697478991597,
601
- "grad_norm": 0.2945646047592163,
602
- "learning_rate": 7.563025210084033e-06,
603
- "loss": 1.7727,
604
- "step": 770
605
- },
606
- {
607
- "epoch": 0.936374549819928,
608
- "grad_norm": 0.2922873795032501,
609
- "learning_rate": 6.362545018007203e-06,
610
- "loss": 1.8482,
611
- "step": 780
612
- },
613
- {
614
- "epoch": 0.9483793517406963,
615
- "grad_norm": 0.29431605339050293,
616
- "learning_rate": 5.162064825930372e-06,
617
- "loss": 1.8867,
618
- "step": 790
619
- },
620
- {
621
- "epoch": 0.9603841536614646,
622
- "grad_norm": 0.32220232486724854,
623
- "learning_rate": 3.9615846338535415e-06,
624
- "loss": 1.8904,
625
- "step": 800
626
- },
627
- {
628
- "epoch": 0.9603841536614646,
629
- "eval_loss": 1.854928970336914,
630
- "eval_runtime": 39.3819,
631
- "eval_samples_per_second": 25.392,
632
- "eval_steps_per_second": 4.241,
633
- "step": 800
634
- }
635
- ],
636
- "logging_steps": 10,
637
- "max_steps": 833,
638
- "num_input_tokens_seen": 0,
639
- "num_train_epochs": 1,
640
- "save_steps": 100,
641
- "stateful_callbacks": {
642
- "TrainerControl": {
643
- "args": {
644
- "should_epoch_stop": false,
645
- "should_evaluate": false,
646
- "should_log": false,
647
- "should_save": true,
648
- "should_training_stop": false
649
- },
650
- "attributes": {}
651
- }
652
- },
653
- "total_flos": 1.3189398273825178e+17,
654
- "train_batch_size": 12,
655
- "trial_name": null,
656
- "trial_params": null
657
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-800/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:74895195f52799cb4b367d00b4d32efdbf28d7d8df31ca68b9fea853fe7beb99
3
- size 5368