cgihlstorf commited on
Commit
7f575d5
1 Parent(s): 4a8f050

Upload 8 files

Browse files
README.md CHANGED
@@ -1,3 +1,202 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: meta-llama/Llama-2-7b-hf
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.10.0
adapter_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "meta-llama/Llama-2-7b-hf",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "q_proj",
24
+ "v_proj"
25
+ ],
26
+ "task_type": "CAUSAL_LM",
27
+ "use_dora": false,
28
+ "use_rslora": false
29
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:606e68bfa25399f65b51d4d0717bf4caf35c3f46881fd1808ca8ecdf30d35a25
3
+ size 16823434
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d33cc8f054bd9adb6542e659392973c6d3c68481190e0a4cdf7638c9d9d606c9
3
+ size 33662074
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bab528e5cc79c8b0170c4528aa77b2350a561c1d11f0f78ae4dcd1e1a5b50970
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66a1e257aac9158b75c35ef44d4eae055e511d1f723567e8ddee4e90477b4e3a
3
+ size 1064
trainer_state.json ADDED
@@ -0,0 +1,1173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 1.5669885873794556,
3
+ "best_model_checkpoint": "/scratch/czm5kz/llama2-7b_8_50_0.0003_sg_finetuned_with_output/checkpoint-180",
4
+ "epoch": 47.407407407407405,
5
+ "eval_steps": 20,
6
+ "global_step": 640,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.37,
13
+ "grad_norm": 2.398926019668579,
14
+ "learning_rate": 0.0002986153846153846,
15
+ "loss": 5.3458,
16
+ "step": 5
17
+ },
18
+ {
19
+ "epoch": 0.74,
20
+ "grad_norm": 1.079797387123108,
21
+ "learning_rate": 0.0002963076923076923,
22
+ "loss": 4.6876,
23
+ "step": 10
24
+ },
25
+ {
26
+ "epoch": 1.11,
27
+ "grad_norm": 7.786762237548828,
28
+ "learning_rate": 0.000294,
29
+ "loss": 4.3658,
30
+ "step": 15
31
+ },
32
+ {
33
+ "epoch": 1.48,
34
+ "grad_norm": 1.545445442199707,
35
+ "learning_rate": 0.0002916923076923077,
36
+ "loss": 3.4792,
37
+ "step": 20
38
+ },
39
+ {
40
+ "epoch": 1.48,
41
+ "eval_loss": 3.3610048294067383,
42
+ "eval_runtime": 0.4367,
43
+ "eval_samples_per_second": 61.822,
44
+ "eval_steps_per_second": 9.159,
45
+ "step": 20
46
+ },
47
+ {
48
+ "epoch": 1.85,
49
+ "grad_norm": 1.6737030744552612,
50
+ "learning_rate": 0.0002893846153846154,
51
+ "loss": 3.0828,
52
+ "step": 25
53
+ },
54
+ {
55
+ "epoch": 2.22,
56
+ "grad_norm": 2.414368152618408,
57
+ "learning_rate": 0.00028707692307692305,
58
+ "loss": 2.6784,
59
+ "step": 30
60
+ },
61
+ {
62
+ "epoch": 2.59,
63
+ "grad_norm": 3.006044864654541,
64
+ "learning_rate": 0.00028476923076923075,
65
+ "loss": 2.4781,
66
+ "step": 35
67
+ },
68
+ {
69
+ "epoch": 2.96,
70
+ "grad_norm": 2.7223751544952393,
71
+ "learning_rate": 0.00028246153846153845,
72
+ "loss": 2.2626,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 2.96,
77
+ "eval_loss": 2.527669906616211,
78
+ "eval_runtime": 0.4336,
79
+ "eval_samples_per_second": 62.265,
80
+ "eval_steps_per_second": 9.224,
81
+ "step": 40
82
+ },
83
+ {
84
+ "epoch": 3.33,
85
+ "grad_norm": 4.4172210693359375,
86
+ "learning_rate": 0.00028015384615384615,
87
+ "loss": 1.8627,
88
+ "step": 45
89
+ },
90
+ {
91
+ "epoch": 3.7,
92
+ "grad_norm": 3.698852062225342,
93
+ "learning_rate": 0.0002778461538461538,
94
+ "loss": 1.6018,
95
+ "step": 50
96
+ },
97
+ {
98
+ "epoch": 4.07,
99
+ "grad_norm": 3.051743984222412,
100
+ "learning_rate": 0.0002755384615384615,
101
+ "loss": 1.4827,
102
+ "step": 55
103
+ },
104
+ {
105
+ "epoch": 4.44,
106
+ "grad_norm": 5.160757064819336,
107
+ "learning_rate": 0.0002732307692307692,
108
+ "loss": 1.0585,
109
+ "step": 60
110
+ },
111
+ {
112
+ "epoch": 4.44,
113
+ "eval_loss": 2.1469380855560303,
114
+ "eval_runtime": 0.4338,
115
+ "eval_samples_per_second": 62.246,
116
+ "eval_steps_per_second": 9.222,
117
+ "step": 60
118
+ },
119
+ {
120
+ "epoch": 4.81,
121
+ "grad_norm": 4.5085062980651855,
122
+ "learning_rate": 0.0002709230769230769,
123
+ "loss": 1.0433,
124
+ "step": 65
125
+ },
126
+ {
127
+ "epoch": 5.19,
128
+ "grad_norm": 3.886779546737671,
129
+ "learning_rate": 0.00026861538461538456,
130
+ "loss": 0.9697,
131
+ "step": 70
132
+ },
133
+ {
134
+ "epoch": 5.56,
135
+ "grad_norm": 4.663851261138916,
136
+ "learning_rate": 0.00026630769230769226,
137
+ "loss": 0.6188,
138
+ "step": 75
139
+ },
140
+ {
141
+ "epoch": 5.93,
142
+ "grad_norm": 5.576120853424072,
143
+ "learning_rate": 0.00026399999999999997,
144
+ "loss": 0.7594,
145
+ "step": 80
146
+ },
147
+ {
148
+ "epoch": 5.93,
149
+ "eval_loss": 1.739160418510437,
150
+ "eval_runtime": 0.4327,
151
+ "eval_samples_per_second": 62.404,
152
+ "eval_steps_per_second": 9.245,
153
+ "step": 80
154
+ },
155
+ {
156
+ "epoch": 6.3,
157
+ "grad_norm": 2.55850887298584,
158
+ "learning_rate": 0.00026169230769230767,
159
+ "loss": 0.5985,
160
+ "step": 85
161
+ },
162
+ {
163
+ "epoch": 6.67,
164
+ "grad_norm": 3.429755687713623,
165
+ "learning_rate": 0.00025938461538461537,
166
+ "loss": 0.4957,
167
+ "step": 90
168
+ },
169
+ {
170
+ "epoch": 7.04,
171
+ "grad_norm": 4.753056526184082,
172
+ "learning_rate": 0.000257076923076923,
173
+ "loss": 0.6527,
174
+ "step": 95
175
+ },
176
+ {
177
+ "epoch": 7.41,
178
+ "grad_norm": 2.638728380203247,
179
+ "learning_rate": 0.0002547692307692307,
180
+ "loss": 0.4148,
181
+ "step": 100
182
+ },
183
+ {
184
+ "epoch": 7.41,
185
+ "eval_loss": 1.5832433700561523,
186
+ "eval_runtime": 0.4351,
187
+ "eval_samples_per_second": 62.055,
188
+ "eval_steps_per_second": 9.193,
189
+ "step": 100
190
+ },
191
+ {
192
+ "epoch": 7.78,
193
+ "grad_norm": 2.906188726425171,
194
+ "learning_rate": 0.0002524615384615384,
195
+ "loss": 0.4678,
196
+ "step": 105
197
+ },
198
+ {
199
+ "epoch": 8.15,
200
+ "grad_norm": 2.0787110328674316,
201
+ "learning_rate": 0.00025015384615384613,
202
+ "loss": 0.4415,
203
+ "step": 110
204
+ },
205
+ {
206
+ "epoch": 8.52,
207
+ "grad_norm": 2.6470298767089844,
208
+ "learning_rate": 0.00024784615384615383,
209
+ "loss": 0.3638,
210
+ "step": 115
211
+ },
212
+ {
213
+ "epoch": 8.89,
214
+ "grad_norm": 3.3667097091674805,
215
+ "learning_rate": 0.00024553846153846154,
216
+ "loss": 0.4925,
217
+ "step": 120
218
+ },
219
+ {
220
+ "epoch": 8.89,
221
+ "eval_loss": 1.6094781160354614,
222
+ "eval_runtime": 0.4339,
223
+ "eval_samples_per_second": 62.231,
224
+ "eval_steps_per_second": 9.219,
225
+ "step": 120
226
+ },
227
+ {
228
+ "epoch": 9.26,
229
+ "grad_norm": 1.8297497034072876,
230
+ "learning_rate": 0.0002432307692307692,
231
+ "loss": 0.3612,
232
+ "step": 125
233
+ },
234
+ {
235
+ "epoch": 9.63,
236
+ "grad_norm": 2.8483166694641113,
237
+ "learning_rate": 0.0002409230769230769,
238
+ "loss": 0.39,
239
+ "step": 130
240
+ },
241
+ {
242
+ "epoch": 10.0,
243
+ "grad_norm": 3.550173282623291,
244
+ "learning_rate": 0.0002386153846153846,
245
+ "loss": 0.4645,
246
+ "step": 135
247
+ },
248
+ {
249
+ "epoch": 10.37,
250
+ "grad_norm": 1.982654333114624,
251
+ "learning_rate": 0.0002363076923076923,
252
+ "loss": 0.3114,
253
+ "step": 140
254
+ },
255
+ {
256
+ "epoch": 10.37,
257
+ "eval_loss": 1.7044697999954224,
258
+ "eval_runtime": 0.4349,
259
+ "eval_samples_per_second": 62.08,
260
+ "eval_steps_per_second": 9.197,
261
+ "step": 140
262
+ },
263
+ {
264
+ "epoch": 10.74,
265
+ "grad_norm": 2.9532089233398438,
266
+ "learning_rate": 0.000234,
267
+ "loss": 0.4241,
268
+ "step": 145
269
+ },
270
+ {
271
+ "epoch": 11.11,
272
+ "grad_norm": 1.4060415029525757,
273
+ "learning_rate": 0.0002316923076923077,
274
+ "loss": 0.3897,
275
+ "step": 150
276
+ },
277
+ {
278
+ "epoch": 11.48,
279
+ "grad_norm": 1.5898511409759521,
280
+ "learning_rate": 0.00022938461538461535,
281
+ "loss": 0.3231,
282
+ "step": 155
283
+ },
284
+ {
285
+ "epoch": 11.85,
286
+ "grad_norm": 2.111783266067505,
287
+ "learning_rate": 0.00022707692307692305,
288
+ "loss": 0.4056,
289
+ "step": 160
290
+ },
291
+ {
292
+ "epoch": 11.85,
293
+ "eval_loss": 1.7500680685043335,
294
+ "eval_runtime": 0.4341,
295
+ "eval_samples_per_second": 62.203,
296
+ "eval_steps_per_second": 9.215,
297
+ "step": 160
298
+ },
299
+ {
300
+ "epoch": 12.22,
301
+ "grad_norm": 1.6774641275405884,
302
+ "learning_rate": 0.00022476923076923075,
303
+ "loss": 0.3562,
304
+ "step": 165
305
+ },
306
+ {
307
+ "epoch": 12.59,
308
+ "grad_norm": 1.5955278873443604,
309
+ "learning_rate": 0.00022246153846153846,
310
+ "loss": 0.3484,
311
+ "step": 170
312
+ },
313
+ {
314
+ "epoch": 12.96,
315
+ "grad_norm": 1.2137044668197632,
316
+ "learning_rate": 0.00022015384615384613,
317
+ "loss": 0.3616,
318
+ "step": 175
319
+ },
320
+ {
321
+ "epoch": 13.33,
322
+ "grad_norm": 1.3046234846115112,
323
+ "learning_rate": 0.00021784615384615383,
324
+ "loss": 0.3443,
325
+ "step": 180
326
+ },
327
+ {
328
+ "epoch": 13.33,
329
+ "eval_loss": 1.5669885873794556,
330
+ "eval_runtime": 0.4348,
331
+ "eval_samples_per_second": 62.098,
332
+ "eval_steps_per_second": 9.2,
333
+ "step": 180
334
+ },
335
+ {
336
+ "epoch": 13.7,
337
+ "grad_norm": 2.3079042434692383,
338
+ "learning_rate": 0.00021553846153846154,
339
+ "loss": 0.3503,
340
+ "step": 185
341
+ },
342
+ {
343
+ "epoch": 14.07,
344
+ "grad_norm": 0.9355862736701965,
345
+ "learning_rate": 0.00021323076923076921,
346
+ "loss": 0.3417,
347
+ "step": 190
348
+ },
349
+ {
350
+ "epoch": 14.44,
351
+ "grad_norm": 1.706770420074463,
352
+ "learning_rate": 0.0002109230769230769,
353
+ "loss": 0.3187,
354
+ "step": 195
355
+ },
356
+ {
357
+ "epoch": 14.81,
358
+ "grad_norm": 1.9932715892791748,
359
+ "learning_rate": 0.0002086153846153846,
360
+ "loss": 0.3489,
361
+ "step": 200
362
+ },
363
+ {
364
+ "epoch": 14.81,
365
+ "eval_loss": 1.6050857305526733,
366
+ "eval_runtime": 0.4347,
367
+ "eval_samples_per_second": 62.115,
368
+ "eval_steps_per_second": 9.202,
369
+ "step": 200
370
+ },
371
+ {
372
+ "epoch": 15.19,
373
+ "grad_norm": 0.947511613368988,
374
+ "learning_rate": 0.0002063076923076923,
375
+ "loss": 0.3556,
376
+ "step": 205
377
+ },
378
+ {
379
+ "epoch": 15.56,
380
+ "grad_norm": 1.3406760692596436,
381
+ "learning_rate": 0.000204,
382
+ "loss": 0.3065,
383
+ "step": 210
384
+ },
385
+ {
386
+ "epoch": 15.93,
387
+ "grad_norm": 1.2684600353240967,
388
+ "learning_rate": 0.00020169230769230767,
389
+ "loss": 0.3413,
390
+ "step": 215
391
+ },
392
+ {
393
+ "epoch": 16.3,
394
+ "grad_norm": 1.7013931274414062,
395
+ "learning_rate": 0.00019938461538461538,
396
+ "loss": 0.3156,
397
+ "step": 220
398
+ },
399
+ {
400
+ "epoch": 16.3,
401
+ "eval_loss": 1.7684317827224731,
402
+ "eval_runtime": 0.4353,
403
+ "eval_samples_per_second": 62.027,
404
+ "eval_steps_per_second": 9.189,
405
+ "step": 220
406
+ },
407
+ {
408
+ "epoch": 16.67,
409
+ "grad_norm": 1.1557570695877075,
410
+ "learning_rate": 0.00019707692307692305,
411
+ "loss": 0.3084,
412
+ "step": 225
413
+ },
414
+ {
415
+ "epoch": 17.04,
416
+ "grad_norm": 1.1206583976745605,
417
+ "learning_rate": 0.00019476923076923076,
418
+ "loss": 0.4039,
419
+ "step": 230
420
+ },
421
+ {
422
+ "epoch": 17.41,
423
+ "grad_norm": 1.4177170991897583,
424
+ "learning_rate": 0.00019246153846153843,
425
+ "loss": 0.3247,
426
+ "step": 235
427
+ },
428
+ {
429
+ "epoch": 17.78,
430
+ "grad_norm": 1.4315319061279297,
431
+ "learning_rate": 0.00019015384615384613,
432
+ "loss": 0.3107,
433
+ "step": 240
434
+ },
435
+ {
436
+ "epoch": 17.78,
437
+ "eval_loss": 1.6817841529846191,
438
+ "eval_runtime": 0.4335,
439
+ "eval_samples_per_second": 62.287,
440
+ "eval_steps_per_second": 9.228,
441
+ "step": 240
442
+ },
443
+ {
444
+ "epoch": 18.15,
445
+ "grad_norm": 1.2824641466140747,
446
+ "learning_rate": 0.00018784615384615384,
447
+ "loss": 0.3149,
448
+ "step": 245
449
+ },
450
+ {
451
+ "epoch": 18.52,
452
+ "grad_norm": 1.095780849456787,
453
+ "learning_rate": 0.00018553846153846154,
454
+ "loss": 0.2916,
455
+ "step": 250
456
+ },
457
+ {
458
+ "epoch": 18.89,
459
+ "grad_norm": 1.2812676429748535,
460
+ "learning_rate": 0.00018323076923076922,
461
+ "loss": 0.3353,
462
+ "step": 255
463
+ },
464
+ {
465
+ "epoch": 19.26,
466
+ "grad_norm": 1.171350359916687,
467
+ "learning_rate": 0.0001809230769230769,
468
+ "loss": 0.309,
469
+ "step": 260
470
+ },
471
+ {
472
+ "epoch": 19.26,
473
+ "eval_loss": 1.7549753189086914,
474
+ "eval_runtime": 0.4347,
475
+ "eval_samples_per_second": 62.113,
476
+ "eval_steps_per_second": 9.202,
477
+ "step": 260
478
+ },
479
+ {
480
+ "epoch": 19.63,
481
+ "grad_norm": 1.1714686155319214,
482
+ "learning_rate": 0.0001786153846153846,
483
+ "loss": 0.3119,
484
+ "step": 265
485
+ },
486
+ {
487
+ "epoch": 20.0,
488
+ "grad_norm": 1.0897107124328613,
489
+ "learning_rate": 0.0001763076923076923,
490
+ "loss": 0.326,
491
+ "step": 270
492
+ },
493
+ {
494
+ "epoch": 20.37,
495
+ "grad_norm": 1.1124438047409058,
496
+ "learning_rate": 0.00017399999999999997,
497
+ "loss": 0.2985,
498
+ "step": 275
499
+ },
500
+ {
501
+ "epoch": 20.74,
502
+ "grad_norm": 0.9445765018463135,
503
+ "learning_rate": 0.00017169230769230768,
504
+ "loss": 0.2918,
505
+ "step": 280
506
+ },
507
+ {
508
+ "epoch": 20.74,
509
+ "eval_loss": 1.7201834917068481,
510
+ "eval_runtime": 0.4514,
511
+ "eval_samples_per_second": 59.813,
512
+ "eval_steps_per_second": 8.861,
513
+ "step": 280
514
+ },
515
+ {
516
+ "epoch": 21.11,
517
+ "grad_norm": 0.9067970514297485,
518
+ "learning_rate": 0.00016938461538461538,
519
+ "loss": 0.3118,
520
+ "step": 285
521
+ },
522
+ {
523
+ "epoch": 21.48,
524
+ "grad_norm": 1.0378901958465576,
525
+ "learning_rate": 0.00016707692307692308,
526
+ "loss": 0.3016,
527
+ "step": 290
528
+ },
529
+ {
530
+ "epoch": 21.85,
531
+ "grad_norm": 1.2347774505615234,
532
+ "learning_rate": 0.00016476923076923073,
533
+ "loss": 0.2904,
534
+ "step": 295
535
+ },
536
+ {
537
+ "epoch": 22.22,
538
+ "grad_norm": 0.9320012927055359,
539
+ "learning_rate": 0.00016246153846153843,
540
+ "loss": 0.3348,
541
+ "step": 300
542
+ },
543
+ {
544
+ "epoch": 22.22,
545
+ "eval_loss": 1.7654225826263428,
546
+ "eval_runtime": 0.4352,
547
+ "eval_samples_per_second": 62.036,
548
+ "eval_steps_per_second": 9.191,
549
+ "step": 300
550
+ },
551
+ {
552
+ "epoch": 22.59,
553
+ "grad_norm": 0.8344219326972961,
554
+ "learning_rate": 0.00016015384615384614,
555
+ "loss": 0.2905,
556
+ "step": 305
557
+ },
558
+ {
559
+ "epoch": 22.96,
560
+ "grad_norm": 1.3457179069519043,
561
+ "learning_rate": 0.00015784615384615384,
562
+ "loss": 0.3244,
563
+ "step": 310
564
+ },
565
+ {
566
+ "epoch": 23.33,
567
+ "grad_norm": 1.1215949058532715,
568
+ "learning_rate": 0.00015553846153846152,
569
+ "loss": 0.2964,
570
+ "step": 315
571
+ },
572
+ {
573
+ "epoch": 23.7,
574
+ "grad_norm": 0.8459953665733337,
575
+ "learning_rate": 0.00015323076923076922,
576
+ "loss": 0.3206,
577
+ "step": 320
578
+ },
579
+ {
580
+ "epoch": 23.7,
581
+ "eval_loss": 1.8162420988082886,
582
+ "eval_runtime": 0.4339,
583
+ "eval_samples_per_second": 62.219,
584
+ "eval_steps_per_second": 9.218,
585
+ "step": 320
586
+ },
587
+ {
588
+ "epoch": 24.07,
589
+ "grad_norm": 0.8673954010009766,
590
+ "learning_rate": 0.00015092307692307692,
591
+ "loss": 0.2768,
592
+ "step": 325
593
+ },
594
+ {
595
+ "epoch": 24.44,
596
+ "grad_norm": 0.9475287795066833,
597
+ "learning_rate": 0.0001486153846153846,
598
+ "loss": 0.2818,
599
+ "step": 330
600
+ },
601
+ {
602
+ "epoch": 24.81,
603
+ "grad_norm": 0.9035760164260864,
604
+ "learning_rate": 0.0001463076923076923,
605
+ "loss": 0.3097,
606
+ "step": 335
607
+ },
608
+ {
609
+ "epoch": 25.19,
610
+ "grad_norm": 0.8320503830909729,
611
+ "learning_rate": 0.00014399999999999998,
612
+ "loss": 0.2968,
613
+ "step": 340
614
+ },
615
+ {
616
+ "epoch": 25.19,
617
+ "eval_loss": 1.8249504566192627,
618
+ "eval_runtime": 0.4343,
619
+ "eval_samples_per_second": 62.165,
620
+ "eval_steps_per_second": 9.21,
621
+ "step": 340
622
+ },
623
+ {
624
+ "epoch": 25.56,
625
+ "grad_norm": 1.0484280586242676,
626
+ "learning_rate": 0.00014169230769230768,
627
+ "loss": 0.2661,
628
+ "step": 345
629
+ },
630
+ {
631
+ "epoch": 25.93,
632
+ "grad_norm": 1.3034342527389526,
633
+ "learning_rate": 0.00013938461538461536,
634
+ "loss": 0.3249,
635
+ "step": 350
636
+ },
637
+ {
638
+ "epoch": 26.3,
639
+ "grad_norm": 0.7918898463249207,
640
+ "learning_rate": 0.00013707692307692306,
641
+ "loss": 0.2881,
642
+ "step": 355
643
+ },
644
+ {
645
+ "epoch": 26.67,
646
+ "grad_norm": 0.8644436001777649,
647
+ "learning_rate": 0.00013476923076923076,
648
+ "loss": 0.3108,
649
+ "step": 360
650
+ },
651
+ {
652
+ "epoch": 26.67,
653
+ "eval_loss": 1.859883189201355,
654
+ "eval_runtime": 0.4339,
655
+ "eval_samples_per_second": 62.222,
656
+ "eval_steps_per_second": 9.218,
657
+ "step": 360
658
+ },
659
+ {
660
+ "epoch": 27.04,
661
+ "grad_norm": 0.8299930095672607,
662
+ "learning_rate": 0.00013246153846153846,
663
+ "loss": 0.2873,
664
+ "step": 365
665
+ },
666
+ {
667
+ "epoch": 27.41,
668
+ "grad_norm": 0.7016355991363525,
669
+ "learning_rate": 0.00013015384615384614,
670
+ "loss": 0.3059,
671
+ "step": 370
672
+ },
673
+ {
674
+ "epoch": 27.78,
675
+ "grad_norm": 0.9215915203094482,
676
+ "learning_rate": 0.00012784615384615384,
677
+ "loss": 0.2854,
678
+ "step": 375
679
+ },
680
+ {
681
+ "epoch": 28.15,
682
+ "grad_norm": 0.8328156471252441,
683
+ "learning_rate": 0.00012553846153846152,
684
+ "loss": 0.274,
685
+ "step": 380
686
+ },
687
+ {
688
+ "epoch": 28.15,
689
+ "eval_loss": 1.9108076095581055,
690
+ "eval_runtime": 0.4355,
691
+ "eval_samples_per_second": 62.003,
692
+ "eval_steps_per_second": 9.186,
693
+ "step": 380
694
+ },
695
+ {
696
+ "epoch": 28.52,
697
+ "grad_norm": 0.9325290322303772,
698
+ "learning_rate": 0.00012323076923076922,
699
+ "loss": 0.2775,
700
+ "step": 385
701
+ },
702
+ {
703
+ "epoch": 28.89,
704
+ "grad_norm": 0.882301926612854,
705
+ "learning_rate": 0.00012092307692307691,
706
+ "loss": 0.3149,
707
+ "step": 390
708
+ },
709
+ {
710
+ "epoch": 29.26,
711
+ "grad_norm": 0.7358686923980713,
712
+ "learning_rate": 0.0001186153846153846,
713
+ "loss": 0.2858,
714
+ "step": 395
715
+ },
716
+ {
717
+ "epoch": 29.63,
718
+ "grad_norm": 1.316838264465332,
719
+ "learning_rate": 0.00011630769230769229,
720
+ "loss": 0.2914,
721
+ "step": 400
722
+ },
723
+ {
724
+ "epoch": 29.63,
725
+ "eval_loss": 1.9297478199005127,
726
+ "eval_runtime": 0.4351,
727
+ "eval_samples_per_second": 62.054,
728
+ "eval_steps_per_second": 9.193,
729
+ "step": 400
730
+ },
731
+ {
732
+ "epoch": 30.0,
733
+ "grad_norm": 1.2439055442810059,
734
+ "learning_rate": 0.00011399999999999999,
735
+ "loss": 0.3069,
736
+ "step": 405
737
+ },
738
+ {
739
+ "epoch": 30.37,
740
+ "grad_norm": 0.6521609425544739,
741
+ "learning_rate": 0.00011169230769230768,
742
+ "loss": 0.2713,
743
+ "step": 410
744
+ },
745
+ {
746
+ "epoch": 30.74,
747
+ "grad_norm": 1.044469952583313,
748
+ "learning_rate": 0.00010938461538461537,
749
+ "loss": 0.2722,
750
+ "step": 415
751
+ },
752
+ {
753
+ "epoch": 31.11,
754
+ "grad_norm": 0.9478164911270142,
755
+ "learning_rate": 0.00010707692307692306,
756
+ "loss": 0.3216,
757
+ "step": 420
758
+ },
759
+ {
760
+ "epoch": 31.11,
761
+ "eval_loss": 1.9086456298828125,
762
+ "eval_runtime": 0.4351,
763
+ "eval_samples_per_second": 62.057,
764
+ "eval_steps_per_second": 9.194,
765
+ "step": 420
766
+ },
767
+ {
768
+ "epoch": 31.48,
769
+ "grad_norm": 0.8392044305801392,
770
+ "learning_rate": 0.00010476923076923076,
771
+ "loss": 0.2754,
772
+ "step": 425
773
+ },
774
+ {
775
+ "epoch": 31.85,
776
+ "grad_norm": 0.9244445562362671,
777
+ "learning_rate": 0.00010246153846153844,
778
+ "loss": 0.2899,
779
+ "step": 430
780
+ },
781
+ {
782
+ "epoch": 32.22,
783
+ "grad_norm": 0.7770041227340698,
784
+ "learning_rate": 0.00010015384615384614,
785
+ "loss": 0.3074,
786
+ "step": 435
787
+ },
788
+ {
789
+ "epoch": 32.59,
790
+ "grad_norm": 0.9049687385559082,
791
+ "learning_rate": 9.784615384615383e-05,
792
+ "loss": 0.2837,
793
+ "step": 440
794
+ },
795
+ {
796
+ "epoch": 32.59,
797
+ "eval_loss": 1.9283045530319214,
798
+ "eval_runtime": 0.436,
799
+ "eval_samples_per_second": 61.931,
800
+ "eval_steps_per_second": 9.175,
801
+ "step": 440
802
+ },
803
+ {
804
+ "epoch": 32.96,
805
+ "grad_norm": 0.9280600547790527,
806
+ "learning_rate": 9.553846153846153e-05,
807
+ "loss": 0.2832,
808
+ "step": 445
809
+ },
810
+ {
811
+ "epoch": 33.33,
812
+ "grad_norm": 1.0627598762512207,
813
+ "learning_rate": 9.323076923076921e-05,
814
+ "loss": 0.3019,
815
+ "step": 450
816
+ },
817
+ {
818
+ "epoch": 33.7,
819
+ "grad_norm": 0.8776496052742004,
820
+ "learning_rate": 9.092307692307691e-05,
821
+ "loss": 0.2691,
822
+ "step": 455
823
+ },
824
+ {
825
+ "epoch": 34.07,
826
+ "grad_norm": 1.0023120641708374,
827
+ "learning_rate": 8.861538461538462e-05,
828
+ "loss": 0.2908,
829
+ "step": 460
830
+ },
831
+ {
832
+ "epoch": 34.07,
833
+ "eval_loss": 1.9639641046524048,
834
+ "eval_runtime": 0.4354,
835
+ "eval_samples_per_second": 62.01,
836
+ "eval_steps_per_second": 9.187,
837
+ "step": 460
838
+ },
839
+ {
840
+ "epoch": 34.44,
841
+ "grad_norm": 0.7977310419082642,
842
+ "learning_rate": 8.63076923076923e-05,
843
+ "loss": 0.2448,
844
+ "step": 465
845
+ },
846
+ {
847
+ "epoch": 34.81,
848
+ "grad_norm": 0.941286563873291,
849
+ "learning_rate": 8.4e-05,
850
+ "loss": 0.3116,
851
+ "step": 470
852
+ },
853
+ {
854
+ "epoch": 35.19,
855
+ "grad_norm": 0.8777848482131958,
856
+ "learning_rate": 8.169230769230768e-05,
857
+ "loss": 0.297,
858
+ "step": 475
859
+ },
860
+ {
861
+ "epoch": 35.56,
862
+ "grad_norm": 1.213904857635498,
863
+ "learning_rate": 7.938461538461539e-05,
864
+ "loss": 0.2789,
865
+ "step": 480
866
+ },
867
+ {
868
+ "epoch": 35.56,
869
+ "eval_loss": 1.9745168685913086,
870
+ "eval_runtime": 0.4348,
871
+ "eval_samples_per_second": 62.101,
872
+ "eval_steps_per_second": 9.2,
873
+ "step": 480
874
+ },
875
+ {
876
+ "epoch": 35.93,
877
+ "grad_norm": 0.9805024266242981,
878
+ "learning_rate": 7.707692307692306e-05,
879
+ "loss": 0.2894,
880
+ "step": 485
881
+ },
882
+ {
883
+ "epoch": 36.3,
884
+ "grad_norm": 0.829560399055481,
885
+ "learning_rate": 7.476923076923077e-05,
886
+ "loss": 0.2597,
887
+ "step": 490
888
+ },
889
+ {
890
+ "epoch": 36.67,
891
+ "grad_norm": 1.1455888748168945,
892
+ "learning_rate": 7.246153846153846e-05,
893
+ "loss": 0.3081,
894
+ "step": 495
895
+ },
896
+ {
897
+ "epoch": 37.04,
898
+ "grad_norm": 0.9539237022399902,
899
+ "learning_rate": 7.015384615384615e-05,
900
+ "loss": 0.2773,
901
+ "step": 500
902
+ },
903
+ {
904
+ "epoch": 37.04,
905
+ "eval_loss": 1.9331037998199463,
906
+ "eval_runtime": 0.4366,
907
+ "eval_samples_per_second": 61.836,
908
+ "eval_steps_per_second": 9.161,
909
+ "step": 500
910
+ },
911
+ {
912
+ "epoch": 37.41,
913
+ "grad_norm": 0.8768445253372192,
914
+ "learning_rate": 6.784615384615383e-05,
915
+ "loss": 0.2595,
916
+ "step": 505
917
+ },
918
+ {
919
+ "epoch": 37.78,
920
+ "grad_norm": 0.9345956444740295,
921
+ "learning_rate": 6.553846153846154e-05,
922
+ "loss": 0.287,
923
+ "step": 510
924
+ },
925
+ {
926
+ "epoch": 38.15,
927
+ "grad_norm": 1.006366491317749,
928
+ "learning_rate": 6.323076923076923e-05,
929
+ "loss": 0.2988,
930
+ "step": 515
931
+ },
932
+ {
933
+ "epoch": 38.52,
934
+ "grad_norm": 0.999914824962616,
935
+ "learning_rate": 6.0923076923076916e-05,
936
+ "loss": 0.2734,
937
+ "step": 520
938
+ },
939
+ {
940
+ "epoch": 38.52,
941
+ "eval_loss": 1.9785383939743042,
942
+ "eval_runtime": 0.4355,
943
+ "eval_samples_per_second": 62.002,
944
+ "eval_steps_per_second": 9.186,
945
+ "step": 520
946
+ },
947
+ {
948
+ "epoch": 38.89,
949
+ "grad_norm": 0.9857865571975708,
950
+ "learning_rate": 5.8615384615384606e-05,
951
+ "loss": 0.288,
952
+ "step": 525
953
+ },
954
+ {
955
+ "epoch": 39.26,
956
+ "grad_norm": 1.0521814823150635,
957
+ "learning_rate": 5.63076923076923e-05,
958
+ "loss": 0.2844,
959
+ "step": 530
960
+ },
961
+ {
962
+ "epoch": 39.63,
963
+ "grad_norm": 0.8729674220085144,
964
+ "learning_rate": 5.399999999999999e-05,
965
+ "loss": 0.2641,
966
+ "step": 535
967
+ },
968
+ {
969
+ "epoch": 40.0,
970
+ "grad_norm": 1.0058977603912354,
971
+ "learning_rate": 5.169230769230769e-05,
972
+ "loss": 0.2916,
973
+ "step": 540
974
+ },
975
+ {
976
+ "epoch": 40.0,
977
+ "eval_loss": 2.0248498916625977,
978
+ "eval_runtime": 0.4342,
979
+ "eval_samples_per_second": 62.187,
980
+ "eval_steps_per_second": 9.213,
981
+ "step": 540
982
+ },
983
+ {
984
+ "epoch": 40.37,
985
+ "grad_norm": 1.0700985193252563,
986
+ "learning_rate": 4.938461538461538e-05,
987
+ "loss": 0.2675,
988
+ "step": 545
989
+ },
990
+ {
991
+ "epoch": 40.74,
992
+ "grad_norm": 0.9227583408355713,
993
+ "learning_rate": 4.707692307692307e-05,
994
+ "loss": 0.2607,
995
+ "step": 550
996
+ },
997
+ {
998
+ "epoch": 41.11,
999
+ "grad_norm": 0.658815324306488,
1000
+ "learning_rate": 4.476923076923076e-05,
1001
+ "loss": 0.2806,
1002
+ "step": 555
1003
+ },
1004
+ {
1005
+ "epoch": 41.48,
1006
+ "grad_norm": 0.849854052066803,
1007
+ "learning_rate": 4.246153846153846e-05,
1008
+ "loss": 0.2703,
1009
+ "step": 560
1010
+ },
1011
+ {
1012
+ "epoch": 41.48,
1013
+ "eval_loss": 1.9876518249511719,
1014
+ "eval_runtime": 0.4356,
1015
+ "eval_samples_per_second": 61.988,
1016
+ "eval_steps_per_second": 9.183,
1017
+ "step": 560
1018
+ },
1019
+ {
1020
+ "epoch": 41.85,
1021
+ "grad_norm": 1.0666935443878174,
1022
+ "learning_rate": 4.015384615384615e-05,
1023
+ "loss": 0.2946,
1024
+ "step": 565
1025
+ },
1026
+ {
1027
+ "epoch": 42.22,
1028
+ "grad_norm": 1.0425372123718262,
1029
+ "learning_rate": 3.784615384615384e-05,
1030
+ "loss": 0.2973,
1031
+ "step": 570
1032
+ },
1033
+ {
1034
+ "epoch": 42.59,
1035
+ "grad_norm": 1.135412335395813,
1036
+ "learning_rate": 3.553846153846153e-05,
1037
+ "loss": 0.2846,
1038
+ "step": 575
1039
+ },
1040
+ {
1041
+ "epoch": 42.96,
1042
+ "grad_norm": 0.9318748712539673,
1043
+ "learning_rate": 3.323076923076923e-05,
1044
+ "loss": 0.2608,
1045
+ "step": 580
1046
+ },
1047
+ {
1048
+ "epoch": 42.96,
1049
+ "eval_loss": 2.014052391052246,
1050
+ "eval_runtime": 0.4365,
1051
+ "eval_samples_per_second": 61.863,
1052
+ "eval_steps_per_second": 9.165,
1053
+ "step": 580
1054
+ },
1055
+ {
1056
+ "epoch": 43.33,
1057
+ "grad_norm": 0.740696370601654,
1058
+ "learning_rate": 3.092307692307692e-05,
1059
+ "loss": 0.2745,
1060
+ "step": 585
1061
+ },
1062
+ {
1063
+ "epoch": 43.7,
1064
+ "grad_norm": 1.0200663805007935,
1065
+ "learning_rate": 2.8615384615384615e-05,
1066
+ "loss": 0.2722,
1067
+ "step": 590
1068
+ },
1069
+ {
1070
+ "epoch": 44.07,
1071
+ "grad_norm": 0.9688575863838196,
1072
+ "learning_rate": 2.6307692307692304e-05,
1073
+ "loss": 0.2811,
1074
+ "step": 595
1075
+ },
1076
+ {
1077
+ "epoch": 44.44,
1078
+ "grad_norm": 1.2867329120635986,
1079
+ "learning_rate": 2.3999999999999997e-05,
1080
+ "loss": 0.262,
1081
+ "step": 600
1082
+ },
1083
+ {
1084
+ "epoch": 44.44,
1085
+ "eval_loss": 2.0364649295806885,
1086
+ "eval_runtime": 0.4343,
1087
+ "eval_samples_per_second": 62.168,
1088
+ "eval_steps_per_second": 9.21,
1089
+ "step": 600
1090
+ },
1091
+ {
1092
+ "epoch": 44.81,
1093
+ "grad_norm": 1.0685030221939087,
1094
+ "learning_rate": 2.169230769230769e-05,
1095
+ "loss": 0.2679,
1096
+ "step": 605
1097
+ },
1098
+ {
1099
+ "epoch": 45.19,
1100
+ "grad_norm": 0.9567782282829285,
1101
+ "learning_rate": 1.9384615384615383e-05,
1102
+ "loss": 0.2855,
1103
+ "step": 610
1104
+ },
1105
+ {
1106
+ "epoch": 45.56,
1107
+ "grad_norm": 0.8821234703063965,
1108
+ "learning_rate": 1.7076923076923076e-05,
1109
+ "loss": 0.2807,
1110
+ "step": 615
1111
+ },
1112
+ {
1113
+ "epoch": 45.93,
1114
+ "grad_norm": 1.212229609489441,
1115
+ "learning_rate": 1.4769230769230768e-05,
1116
+ "loss": 0.2767,
1117
+ "step": 620
1118
+ },
1119
+ {
1120
+ "epoch": 45.93,
1121
+ "eval_loss": 2.0518877506256104,
1122
+ "eval_runtime": 0.4334,
1123
+ "eval_samples_per_second": 62.291,
1124
+ "eval_steps_per_second": 9.228,
1125
+ "step": 620
1126
+ },
1127
+ {
1128
+ "epoch": 46.3,
1129
+ "grad_norm": 1.0351111888885498,
1130
+ "learning_rate": 1.2461538461538461e-05,
1131
+ "loss": 0.2699,
1132
+ "step": 625
1133
+ },
1134
+ {
1135
+ "epoch": 46.67,
1136
+ "grad_norm": 1.1969187259674072,
1137
+ "learning_rate": 1.0153846153846152e-05,
1138
+ "loss": 0.277,
1139
+ "step": 630
1140
+ },
1141
+ {
1142
+ "epoch": 47.04,
1143
+ "grad_norm": 0.7708118557929993,
1144
+ "learning_rate": 7.846153846153845e-06,
1145
+ "loss": 0.2637,
1146
+ "step": 635
1147
+ },
1148
+ {
1149
+ "epoch": 47.41,
1150
+ "grad_norm": 0.8004487752914429,
1151
+ "learning_rate": 5.5384615384615385e-06,
1152
+ "loss": 0.2642,
1153
+ "step": 640
1154
+ },
1155
+ {
1156
+ "epoch": 47.41,
1157
+ "eval_loss": 2.0601537227630615,
1158
+ "eval_runtime": 0.4355,
1159
+ "eval_samples_per_second": 61.996,
1160
+ "eval_steps_per_second": 9.185,
1161
+ "step": 640
1162
+ }
1163
+ ],
1164
+ "logging_steps": 5,
1165
+ "max_steps": 650,
1166
+ "num_input_tokens_seen": 0,
1167
+ "num_train_epochs": 50,
1168
+ "save_steps": 20,
1169
+ "total_flos": 5533698562129920.0,
1170
+ "train_batch_size": 4,
1171
+ "trial_name": null,
1172
+ "trial_params": null
1173
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a957e5a99d0748cd66c2f581dde5af79e6b3afeea680a16092a23641c1e986c1
3
+ size 5048