Niraya666 commited on
Commit
6f739ef
1 Parent(s): fa74955

End of training

Browse files
README.md ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: microsoft/swin-tiny-patch4-window7-224
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - imagefolder
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: swin-tiny-patch4-window7-224-finetuned-ADC-4cls-0922
12
+ results:
13
+ - task:
14
+ name: Image Classification
15
+ type: image-classification
16
+ dataset:
17
+ name: imagefolder
18
+ type: imagefolder
19
+ config: default
20
+ split: test
21
+ args: default
22
+ metrics:
23
+ - name: Accuracy
24
+ type: accuracy
25
+ value: 0.7
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ # swin-tiny-patch4-window7-224-finetuned-ADC-4cls-0922
32
+
33
+ This model is a fine-tuned version of [microsoft/swin-tiny-patch4-window7-224](https://huggingface.co/microsoft/swin-tiny-patch4-window7-224) on the imagefolder dataset.
34
+ It achieves the following results on the evaluation set:
35
+ - Loss: 0.8947
36
+ - Accuracy: 0.7
37
+
38
+ ## Model description
39
+
40
+ More information needed
41
+
42
+ ## Intended uses & limitations
43
+
44
+ More information needed
45
+
46
+ ## Training and evaluation data
47
+
48
+ More information needed
49
+
50
+ ## Training procedure
51
+
52
+ ### Training hyperparameters
53
+
54
+ The following hyperparameters were used during training:
55
+ - learning_rate: 0.0001
56
+ - train_batch_size: 64
57
+ - eval_batch_size: 64
58
+ - seed: 42
59
+ - gradient_accumulation_steps: 4
60
+ - total_train_batch_size: 256
61
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
+ - lr_scheduler_type: linear
63
+ - lr_scheduler_warmup_ratio: 0.2
64
+ - num_epochs: 200
65
+
66
+ ### Training results
67
+
68
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
69
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|
70
+ | No log | 1.0 | 2 | 0.9655 | 0.6714 |
71
+ | No log | 2.0 | 4 | 0.9654 | 0.6571 |
72
+ | No log | 3.0 | 6 | 0.9651 | 0.6571 |
73
+ | No log | 4.0 | 8 | 0.9647 | 0.6571 |
74
+ | 1.0064 | 5.0 | 10 | 0.9641 | 0.6571 |
75
+ | 1.0064 | 6.0 | 12 | 0.9635 | 0.6571 |
76
+ | 1.0064 | 7.0 | 14 | 0.9629 | 0.6571 |
77
+ | 1.0064 | 8.0 | 16 | 0.9623 | 0.6571 |
78
+ | 1.0064 | 9.0 | 18 | 0.9617 | 0.6571 |
79
+ | 0.9821 | 10.0 | 20 | 0.9611 | 0.6571 |
80
+ | 0.9821 | 11.0 | 22 | 0.9607 | 0.6571 |
81
+ | 0.9821 | 12.0 | 24 | 0.9604 | 0.6714 |
82
+ | 0.9821 | 13.0 | 26 | 0.9601 | 0.6714 |
83
+ | 0.9821 | 14.0 | 28 | 0.9597 | 0.6714 |
84
+ | 1.0278 | 15.0 | 30 | 0.9592 | 0.6714 |
85
+ | 1.0278 | 16.0 | 32 | 0.9581 | 0.6714 |
86
+ | 1.0278 | 17.0 | 34 | 0.9567 | 0.6714 |
87
+ | 1.0278 | 18.0 | 36 | 0.9551 | 0.6714 |
88
+ | 1.0278 | 19.0 | 38 | 0.9534 | 0.6714 |
89
+ | 0.9986 | 20.0 | 40 | 0.9514 | 0.6571 |
90
+ | 0.9986 | 21.0 | 42 | 0.9493 | 0.6571 |
91
+ | 0.9986 | 22.0 | 44 | 0.9472 | 0.6429 |
92
+ | 0.9986 | 23.0 | 46 | 0.9452 | 0.6429 |
93
+ | 0.9986 | 24.0 | 48 | 0.9434 | 0.6429 |
94
+ | 0.9973 | 25.0 | 50 | 0.9420 | 0.6429 |
95
+ | 0.9973 | 26.0 | 52 | 0.9405 | 0.6429 |
96
+ | 0.9973 | 27.0 | 54 | 0.9387 | 0.6286 |
97
+ | 0.9973 | 28.0 | 56 | 0.9376 | 0.6286 |
98
+ | 0.9973 | 29.0 | 58 | 0.9368 | 0.6429 |
99
+ | 0.9936 | 30.0 | 60 | 0.9362 | 0.6429 |
100
+ | 0.9936 | 31.0 | 62 | 0.9361 | 0.6571 |
101
+ | 0.9936 | 32.0 | 64 | 0.9364 | 0.6714 |
102
+ | 0.9936 | 33.0 | 66 | 0.9371 | 0.6714 |
103
+ | 0.9936 | 34.0 | 68 | 0.9380 | 0.6429 |
104
+ | 0.9746 | 35.0 | 70 | 0.9380 | 0.6571 |
105
+ | 0.9746 | 36.0 | 72 | 0.9375 | 0.6714 |
106
+ | 0.9746 | 37.0 | 74 | 0.9380 | 0.6714 |
107
+ | 0.9746 | 38.0 | 76 | 0.9375 | 0.6714 |
108
+ | 0.9746 | 39.0 | 78 | 0.9370 | 0.6714 |
109
+ | 1.0113 | 40.0 | 80 | 0.9362 | 0.6714 |
110
+ | 1.0113 | 41.0 | 82 | 0.9341 | 0.6714 |
111
+ | 1.0113 | 42.0 | 84 | 0.9301 | 0.6857 |
112
+ | 1.0113 | 43.0 | 86 | 0.9260 | 0.6714 |
113
+ | 1.0113 | 44.0 | 88 | 0.9224 | 0.6571 |
114
+ | 0.9756 | 45.0 | 90 | 0.9190 | 0.6714 |
115
+ | 0.9756 | 46.0 | 92 | 0.9154 | 0.6714 |
116
+ | 0.9756 | 47.0 | 94 | 0.9123 | 0.6714 |
117
+ | 0.9756 | 48.0 | 96 | 0.9091 | 0.6571 |
118
+ | 0.9756 | 49.0 | 98 | 0.9071 | 0.6571 |
119
+ | 0.9721 | 50.0 | 100 | 0.9056 | 0.6571 |
120
+ | 0.9721 | 51.0 | 102 | 0.9047 | 0.6571 |
121
+ | 0.9721 | 52.0 | 104 | 0.9039 | 0.6571 |
122
+ | 0.9721 | 53.0 | 106 | 0.9031 | 0.6714 |
123
+ | 0.9721 | 54.0 | 108 | 0.9025 | 0.6714 |
124
+ | 0.9698 | 55.0 | 110 | 0.9023 | 0.6714 |
125
+ | 0.9698 | 56.0 | 112 | 0.9012 | 0.6714 |
126
+ | 0.9698 | 57.0 | 114 | 0.8997 | 0.6714 |
127
+ | 0.9698 | 58.0 | 116 | 0.8982 | 0.6714 |
128
+ | 0.9698 | 59.0 | 118 | 0.8970 | 0.6714 |
129
+ | 0.9341 | 60.0 | 120 | 0.8957 | 0.6857 |
130
+ | 0.9341 | 61.0 | 122 | 0.8947 | 0.7 |
131
+ | 0.9341 | 62.0 | 124 | 0.8940 | 0.7 |
132
+ | 0.9341 | 63.0 | 126 | 0.8941 | 0.6714 |
133
+ | 0.9341 | 64.0 | 128 | 0.8934 | 0.6714 |
134
+ | 0.9717 | 65.0 | 130 | 0.8917 | 0.6714 |
135
+ | 0.9717 | 66.0 | 132 | 0.8898 | 0.6857 |
136
+ | 0.9717 | 67.0 | 134 | 0.8884 | 0.6857 |
137
+ | 0.9717 | 68.0 | 136 | 0.8870 | 0.6857 |
138
+ | 0.9717 | 69.0 | 138 | 0.8854 | 0.6857 |
139
+ | 0.9655 | 70.0 | 140 | 0.8840 | 0.6857 |
140
+ | 0.9655 | 71.0 | 142 | 0.8827 | 0.6857 |
141
+ | 0.9655 | 72.0 | 144 | 0.8814 | 0.6857 |
142
+ | 0.9655 | 73.0 | 146 | 0.8805 | 0.6857 |
143
+ | 0.9655 | 74.0 | 148 | 0.8803 | 0.6857 |
144
+ | 0.9458 | 75.0 | 150 | 0.8802 | 0.6857 |
145
+ | 0.9458 | 76.0 | 152 | 0.8797 | 0.6714 |
146
+ | 0.9458 | 77.0 | 154 | 0.8794 | 0.6714 |
147
+ | 0.9458 | 78.0 | 156 | 0.8796 | 0.6714 |
148
+ | 0.9458 | 79.0 | 158 | 0.8808 | 0.6714 |
149
+ | 0.9094 | 80.0 | 160 | 0.8817 | 0.6714 |
150
+ | 0.9094 | 81.0 | 162 | 0.8828 | 0.6714 |
151
+ | 0.9094 | 82.0 | 164 | 0.8836 | 0.6714 |
152
+ | 0.9094 | 83.0 | 166 | 0.8830 | 0.6714 |
153
+ | 0.9094 | 84.0 | 168 | 0.8821 | 0.6571 |
154
+ | 0.8719 | 85.0 | 170 | 0.8813 | 0.6571 |
155
+ | 0.8719 | 86.0 | 172 | 0.8804 | 0.6714 |
156
+ | 0.8719 | 87.0 | 174 | 0.8798 | 0.6571 |
157
+ | 0.8719 | 88.0 | 176 | 0.8787 | 0.6571 |
158
+ | 0.8719 | 89.0 | 178 | 0.8770 | 0.6571 |
159
+ | 0.9288 | 90.0 | 180 | 0.8752 | 0.6857 |
160
+ | 0.9288 | 91.0 | 182 | 0.8722 | 0.6857 |
161
+ | 0.9288 | 92.0 | 184 | 0.8694 | 0.6714 |
162
+ | 0.9288 | 93.0 | 186 | 0.8670 | 0.6714 |
163
+ | 0.9288 | 94.0 | 188 | 0.8645 | 0.6857 |
164
+ | 0.9039 | 95.0 | 190 | 0.8624 | 0.6857 |
165
+ | 0.9039 | 96.0 | 192 | 0.8603 | 0.6714 |
166
+ | 0.9039 | 97.0 | 194 | 0.8584 | 0.6857 |
167
+ | 0.9039 | 98.0 | 196 | 0.8566 | 0.6857 |
168
+ | 0.9039 | 99.0 | 198 | 0.8553 | 0.6857 |
169
+ | 0.9081 | 100.0 | 200 | 0.8550 | 0.6857 |
170
+ | 0.9081 | 101.0 | 202 | 0.8551 | 0.6857 |
171
+ | 0.9081 | 102.0 | 204 | 0.8556 | 0.6857 |
172
+ | 0.9081 | 103.0 | 206 | 0.8558 | 0.6857 |
173
+ | 0.9081 | 104.0 | 208 | 0.8554 | 0.6857 |
174
+ | 0.9142 | 105.0 | 210 | 0.8551 | 0.6857 |
175
+ | 0.9142 | 106.0 | 212 | 0.8553 | 0.6857 |
176
+ | 0.9142 | 107.0 | 214 | 0.8551 | 0.6857 |
177
+ | 0.9142 | 108.0 | 216 | 0.8549 | 0.6857 |
178
+ | 0.9142 | 109.0 | 218 | 0.8549 | 0.6857 |
179
+ | 0.9347 | 110.0 | 220 | 0.8551 | 0.6714 |
180
+ | 0.9347 | 111.0 | 222 | 0.8554 | 0.6714 |
181
+ | 0.9347 | 112.0 | 224 | 0.8548 | 0.6714 |
182
+ | 0.9347 | 113.0 | 226 | 0.8538 | 0.6714 |
183
+ | 0.9347 | 114.0 | 228 | 0.8525 | 0.6714 |
184
+ | 0.8922 | 115.0 | 230 | 0.8512 | 0.6857 |
185
+ | 0.8922 | 116.0 | 232 | 0.8505 | 0.6857 |
186
+ | 0.8922 | 117.0 | 234 | 0.8495 | 0.6857 |
187
+ | 0.8922 | 118.0 | 236 | 0.8484 | 0.6857 |
188
+ | 0.8922 | 119.0 | 238 | 0.8472 | 0.6857 |
189
+ | 0.8897 | 120.0 | 240 | 0.8456 | 0.6857 |
190
+ | 0.8897 | 121.0 | 242 | 0.8440 | 0.6857 |
191
+ | 0.8897 | 122.0 | 244 | 0.8426 | 0.6714 |
192
+ | 0.8897 | 123.0 | 246 | 0.8412 | 0.6857 |
193
+ | 0.8897 | 124.0 | 248 | 0.8396 | 0.6857 |
194
+ | 0.8829 | 125.0 | 250 | 0.8384 | 0.6857 |
195
+ | 0.8829 | 126.0 | 252 | 0.8373 | 0.6857 |
196
+ | 0.8829 | 127.0 | 254 | 0.8365 | 0.6857 |
197
+ | 0.8829 | 128.0 | 256 | 0.8360 | 0.6857 |
198
+ | 0.8829 | 129.0 | 258 | 0.8353 | 0.6857 |
199
+ | 0.8744 | 130.0 | 260 | 0.8344 | 0.6857 |
200
+ | 0.8744 | 131.0 | 262 | 0.8337 | 0.6714 |
201
+ | 0.8744 | 132.0 | 264 | 0.8329 | 0.6857 |
202
+ | 0.8744 | 133.0 | 266 | 0.8325 | 0.6857 |
203
+ | 0.8744 | 134.0 | 268 | 0.8318 | 0.6857 |
204
+ | 0.8657 | 135.0 | 270 | 0.8312 | 0.6857 |
205
+ | 0.8657 | 136.0 | 272 | 0.8306 | 0.6714 |
206
+ | 0.8657 | 137.0 | 274 | 0.8300 | 0.6714 |
207
+ | 0.8657 | 138.0 | 276 | 0.8296 | 0.6714 |
208
+ | 0.8657 | 139.0 | 278 | 0.8294 | 0.6714 |
209
+ | 0.9421 | 140.0 | 280 | 0.8292 | 0.6714 |
210
+ | 0.9421 | 141.0 | 282 | 0.8291 | 0.6714 |
211
+ | 0.9421 | 142.0 | 284 | 0.8290 | 0.6714 |
212
+ | 0.9421 | 143.0 | 286 | 0.8290 | 0.6857 |
213
+ | 0.9421 | 144.0 | 288 | 0.8289 | 0.6857 |
214
+ | 0.9066 | 145.0 | 290 | 0.8287 | 0.6857 |
215
+ | 0.9066 | 146.0 | 292 | 0.8290 | 0.6857 |
216
+ | 0.9066 | 147.0 | 294 | 0.8293 | 0.6857 |
217
+ | 0.9066 | 148.0 | 296 | 0.8294 | 0.6857 |
218
+ | 0.9066 | 149.0 | 298 | 0.8295 | 0.6857 |
219
+ | 0.9068 | 150.0 | 300 | 0.8295 | 0.6857 |
220
+ | 0.9068 | 151.0 | 302 | 0.8294 | 0.6857 |
221
+ | 0.9068 | 152.0 | 304 | 0.8293 | 0.6857 |
222
+ | 0.9068 | 153.0 | 306 | 0.8293 | 0.6857 |
223
+ | 0.9068 | 154.0 | 308 | 0.8290 | 0.6857 |
224
+ | 0.8715 | 155.0 | 310 | 0.8287 | 0.6857 |
225
+ | 0.8715 | 156.0 | 312 | 0.8283 | 0.6857 |
226
+ | 0.8715 | 157.0 | 314 | 0.8277 | 0.6857 |
227
+ | 0.8715 | 158.0 | 316 | 0.8274 | 0.6857 |
228
+ | 0.8715 | 159.0 | 318 | 0.8269 | 0.6857 |
229
+ | 0.8921 | 160.0 | 320 | 0.8266 | 0.6857 |
230
+ | 0.8921 | 161.0 | 322 | 0.8264 | 0.6857 |
231
+ | 0.8921 | 162.0 | 324 | 0.8261 | 0.6857 |
232
+ | 0.8921 | 163.0 | 326 | 0.8260 | 0.6857 |
233
+ | 0.8921 | 164.0 | 328 | 0.8258 | 0.6857 |
234
+ | 0.8768 | 165.0 | 330 | 0.8252 | 0.6857 |
235
+ | 0.8768 | 166.0 | 332 | 0.8248 | 0.6857 |
236
+ | 0.8768 | 167.0 | 334 | 0.8243 | 0.6857 |
237
+ | 0.8768 | 168.0 | 336 | 0.8237 | 0.6857 |
238
+ | 0.8768 | 169.0 | 338 | 0.8231 | 0.6857 |
239
+ | 0.8519 | 170.0 | 340 | 0.8227 | 0.6857 |
240
+ | 0.8519 | 171.0 | 342 | 0.8223 | 0.6857 |
241
+ | 0.8519 | 172.0 | 344 | 0.8221 | 0.6857 |
242
+ | 0.8519 | 173.0 | 346 | 0.8220 | 0.6857 |
243
+ | 0.8519 | 174.0 | 348 | 0.8218 | 0.6857 |
244
+ | 0.92 | 175.0 | 350 | 0.8215 | 0.6857 |
245
+ | 0.92 | 176.0 | 352 | 0.8211 | 0.7 |
246
+ | 0.92 | 177.0 | 354 | 0.8207 | 0.7 |
247
+ | 0.92 | 178.0 | 356 | 0.8204 | 0.7 |
248
+ | 0.92 | 179.0 | 358 | 0.8200 | 0.7 |
249
+ | 0.879 | 180.0 | 360 | 0.8197 | 0.7 |
250
+ | 0.879 | 181.0 | 362 | 0.8194 | 0.7 |
251
+ | 0.879 | 182.0 | 364 | 0.8191 | 0.6857 |
252
+ | 0.879 | 183.0 | 366 | 0.8187 | 0.6857 |
253
+ | 0.879 | 184.0 | 368 | 0.8185 | 0.7 |
254
+ | 0.8893 | 185.0 | 370 | 0.8182 | 0.7 |
255
+ | 0.8893 | 186.0 | 372 | 0.8180 | 0.7 |
256
+ | 0.8893 | 187.0 | 374 | 0.8177 | 0.7 |
257
+ | 0.8893 | 188.0 | 376 | 0.8176 | 0.7 |
258
+ | 0.8893 | 189.0 | 378 | 0.8175 | 0.7 |
259
+ | 0.8501 | 190.0 | 380 | 0.8173 | 0.7 |
260
+ | 0.8501 | 191.0 | 382 | 0.8171 | 0.7 |
261
+ | 0.8501 | 192.0 | 384 | 0.8170 | 0.7 |
262
+ | 0.8501 | 193.0 | 386 | 0.8169 | 0.7 |
263
+ | 0.8501 | 194.0 | 388 | 0.8169 | 0.7 |
264
+ | 0.8611 | 195.0 | 390 | 0.8168 | 0.7 |
265
+ | 0.8611 | 196.0 | 392 | 0.8168 | 0.7 |
266
+ | 0.8611 | 197.0 | 394 | 0.8168 | 0.7 |
267
+ | 0.8611 | 198.0 | 396 | 0.8168 | 0.7 |
268
+ | 0.8611 | 199.0 | 398 | 0.8168 | 0.7 |
269
+ | 0.8881 | 200.0 | 400 | 0.8168 | 0.7 |
270
+
271
+
272
+ ### Framework versions
273
+
274
+ - Transformers 4.33.2
275
+ - Pytorch 2.0.1+cu118
276
+ - Datasets 2.14.5
277
+ - Tokenizers 0.13.3
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 200.0,
3
+ "eval_accuracy": 0.7,
4
+ "eval_loss": 0.8946982622146606,
5
+ "eval_runtime": 0.7238,
6
+ "eval_samples_per_second": 96.708,
7
+ "eval_steps_per_second": 2.763,
8
+ "total_flos": 2.2371640252416e+18,
9
+ "train_loss": 0.9259392237663269,
10
+ "train_runtime": 1042.9233,
11
+ "train_samples_per_second": 86.296,
12
+ "train_steps_per_second": 0.384
13
+ }
config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/swin-tiny-patch4-window7-224",
3
+ "architectures": [
4
+ "SwinForImageClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "depths": [
8
+ 2,
9
+ 2,
10
+ 6,
11
+ 2
12
+ ],
13
+ "drop_path_rate": 0.1,
14
+ "embed_dim": 96,
15
+ "encoder_stride": 32,
16
+ "hidden_act": "gelu",
17
+ "hidden_dropout_prob": 0.0,
18
+ "hidden_size": 768,
19
+ "id2label": {
20
+ "0": "Color",
21
+ "1": "Pattern_fail",
22
+ "2": "Residue",
23
+ "3": "Tiny"
24
+ },
25
+ "image_size": 224,
26
+ "initializer_range": 0.02,
27
+ "label2id": {
28
+ "Color": 0,
29
+ "Pattern_fail": 1,
30
+ "Residue": 2,
31
+ "Tiny": 3
32
+ },
33
+ "layer_norm_eps": 1e-05,
34
+ "mlp_ratio": 4.0,
35
+ "model_type": "swin",
36
+ "num_channels": 3,
37
+ "num_heads": [
38
+ 3,
39
+ 6,
40
+ 12,
41
+ 24
42
+ ],
43
+ "num_layers": 4,
44
+ "out_features": [
45
+ "stage4"
46
+ ],
47
+ "out_indices": [
48
+ 4
49
+ ],
50
+ "patch_size": 4,
51
+ "path_norm": true,
52
+ "problem_type": "single_label_classification",
53
+ "qkv_bias": true,
54
+ "stage_names": [
55
+ "stem",
56
+ "stage1",
57
+ "stage2",
58
+ "stage3",
59
+ "stage4"
60
+ ],
61
+ "torch_dtype": "float32",
62
+ "transformers_version": "4.33.2",
63
+ "use_absolute_embeddings": false,
64
+ "window_size": 7
65
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 200.0,
3
+ "eval_accuracy": 0.7,
4
+ "eval_loss": 0.8946982622146606,
5
+ "eval_runtime": 0.7238,
6
+ "eval_samples_per_second": 96.708,
7
+ "eval_steps_per_second": 2.763
8
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "do_rescale": true,
4
+ "do_resize": true,
5
+ "image_mean": [
6
+ 0.485,
7
+ 0.456,
8
+ 0.406
9
+ ],
10
+ "image_processor_type": "ViTImageProcessor",
11
+ "image_std": [
12
+ 0.229,
13
+ 0.224,
14
+ 0.225
15
+ ],
16
+ "resample": 3,
17
+ "rescale_factor": 0.00392156862745098,
18
+ "size": {
19
+ "height": 224,
20
+ "width": 224
21
+ }
22
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62a1f5e3012f027df146a06447599ad9dc2f8df67b5acdfa20294a637223905e
3
+ size 110401009
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 200.0,
3
+ "total_flos": 2.2371640252416e+18,
4
+ "train_loss": 0.9259392237663269,
5
+ "train_runtime": 1042.9233,
6
+ "train_samples_per_second": 86.296,
7
+ "train_steps_per_second": 0.384
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2068 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.7,
3
+ "best_model_checkpoint": "swin-tiny-patch4-window7-224-finetuned-ADC-4cls-0922/checkpoint-122",
4
+ "epoch": 200.0,
5
+ "eval_steps": 500,
6
+ "global_step": 400,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_accuracy": 0.6714285714285714,
14
+ "eval_loss": 0.9655490517616272,
15
+ "eval_runtime": 0.8298,
16
+ "eval_samples_per_second": 84.356,
17
+ "eval_steps_per_second": 2.41,
18
+ "step": 2
19
+ },
20
+ {
21
+ "epoch": 2.0,
22
+ "eval_accuracy": 0.6571428571428571,
23
+ "eval_loss": 0.9653854370117188,
24
+ "eval_runtime": 0.6383,
25
+ "eval_samples_per_second": 109.671,
26
+ "eval_steps_per_second": 3.133,
27
+ "step": 4
28
+ },
29
+ {
30
+ "epoch": 3.0,
31
+ "eval_accuracy": 0.6571428571428571,
32
+ "eval_loss": 0.9650949835777283,
33
+ "eval_runtime": 0.6412,
34
+ "eval_samples_per_second": 109.167,
35
+ "eval_steps_per_second": 3.119,
36
+ "step": 6
37
+ },
38
+ {
39
+ "epoch": 4.0,
40
+ "eval_accuracy": 0.6571428571428571,
41
+ "eval_loss": 0.9646532535552979,
42
+ "eval_runtime": 0.8218,
43
+ "eval_samples_per_second": 85.18,
44
+ "eval_steps_per_second": 2.434,
45
+ "step": 8
46
+ },
47
+ {
48
+ "epoch": 5.0,
49
+ "learning_rate": 1.25e-05,
50
+ "loss": 1.0064,
51
+ "step": 10
52
+ },
53
+ {
54
+ "epoch": 5.0,
55
+ "eval_accuracy": 0.6571428571428571,
56
+ "eval_loss": 0.9641380310058594,
57
+ "eval_runtime": 0.6452,
58
+ "eval_samples_per_second": 108.491,
59
+ "eval_steps_per_second": 3.1,
60
+ "step": 10
61
+ },
62
+ {
63
+ "epoch": 6.0,
64
+ "eval_accuracy": 0.6571428571428571,
65
+ "eval_loss": 0.9635317921638489,
66
+ "eval_runtime": 0.6347,
67
+ "eval_samples_per_second": 110.284,
68
+ "eval_steps_per_second": 3.151,
69
+ "step": 12
70
+ },
71
+ {
72
+ "epoch": 7.0,
73
+ "eval_accuracy": 0.6571428571428571,
74
+ "eval_loss": 0.9628700017929077,
75
+ "eval_runtime": 0.8273,
76
+ "eval_samples_per_second": 84.611,
77
+ "eval_steps_per_second": 2.417,
78
+ "step": 14
79
+ },
80
+ {
81
+ "epoch": 8.0,
82
+ "eval_accuracy": 0.6571428571428571,
83
+ "eval_loss": 0.9623274803161621,
84
+ "eval_runtime": 0.6551,
85
+ "eval_samples_per_second": 106.859,
86
+ "eval_steps_per_second": 3.053,
87
+ "step": 16
88
+ },
89
+ {
90
+ "epoch": 9.0,
91
+ "eval_accuracy": 0.6571428571428571,
92
+ "eval_loss": 0.9616996645927429,
93
+ "eval_runtime": 0.646,
94
+ "eval_samples_per_second": 108.363,
95
+ "eval_steps_per_second": 3.096,
96
+ "step": 18
97
+ },
98
+ {
99
+ "epoch": 10.0,
100
+ "learning_rate": 2.5e-05,
101
+ "loss": 0.9821,
102
+ "step": 20
103
+ },
104
+ {
105
+ "epoch": 10.0,
106
+ "eval_accuracy": 0.6571428571428571,
107
+ "eval_loss": 0.9611372947692871,
108
+ "eval_runtime": 0.8313,
109
+ "eval_samples_per_second": 84.202,
110
+ "eval_steps_per_second": 2.406,
111
+ "step": 20
112
+ },
113
+ {
114
+ "epoch": 11.0,
115
+ "eval_accuracy": 0.6571428571428571,
116
+ "eval_loss": 0.9607454538345337,
117
+ "eval_runtime": 0.8335,
118
+ "eval_samples_per_second": 83.985,
119
+ "eval_steps_per_second": 2.4,
120
+ "step": 22
121
+ },
122
+ {
123
+ "epoch": 12.0,
124
+ "eval_accuracy": 0.6714285714285714,
125
+ "eval_loss": 0.9604489207267761,
126
+ "eval_runtime": 0.8194,
127
+ "eval_samples_per_second": 85.429,
128
+ "eval_steps_per_second": 2.441,
129
+ "step": 24
130
+ },
131
+ {
132
+ "epoch": 13.0,
133
+ "eval_accuracy": 0.6714285714285714,
134
+ "eval_loss": 0.9601203799247742,
135
+ "eval_runtime": 0.8211,
136
+ "eval_samples_per_second": 85.256,
137
+ "eval_steps_per_second": 2.436,
138
+ "step": 26
139
+ },
140
+ {
141
+ "epoch": 14.0,
142
+ "eval_accuracy": 0.6714285714285714,
143
+ "eval_loss": 0.9597390294075012,
144
+ "eval_runtime": 0.6563,
145
+ "eval_samples_per_second": 106.663,
146
+ "eval_steps_per_second": 3.048,
147
+ "step": 28
148
+ },
149
+ {
150
+ "epoch": 15.0,
151
+ "learning_rate": 3.7500000000000003e-05,
152
+ "loss": 1.0278,
153
+ "step": 30
154
+ },
155
+ {
156
+ "epoch": 15.0,
157
+ "eval_accuracy": 0.6714285714285714,
158
+ "eval_loss": 0.9591529965400696,
159
+ "eval_runtime": 0.6495,
160
+ "eval_samples_per_second": 107.778,
161
+ "eval_steps_per_second": 3.079,
162
+ "step": 30
163
+ },
164
+ {
165
+ "epoch": 16.0,
166
+ "eval_accuracy": 0.6714285714285714,
167
+ "eval_loss": 0.9581246376037598,
168
+ "eval_runtime": 0.791,
169
+ "eval_samples_per_second": 88.495,
170
+ "eval_steps_per_second": 2.528,
171
+ "step": 32
172
+ },
173
+ {
174
+ "epoch": 17.0,
175
+ "eval_accuracy": 0.6714285714285714,
176
+ "eval_loss": 0.9566996097564697,
177
+ "eval_runtime": 0.6461,
178
+ "eval_samples_per_second": 108.347,
179
+ "eval_steps_per_second": 3.096,
180
+ "step": 34
181
+ },
182
+ {
183
+ "epoch": 18.0,
184
+ "eval_accuracy": 0.6714285714285714,
185
+ "eval_loss": 0.9551236629486084,
186
+ "eval_runtime": 0.6456,
187
+ "eval_samples_per_second": 108.429,
188
+ "eval_steps_per_second": 3.098,
189
+ "step": 36
190
+ },
191
+ {
192
+ "epoch": 19.0,
193
+ "eval_accuracy": 0.6714285714285714,
194
+ "eval_loss": 0.9534342288970947,
195
+ "eval_runtime": 0.8038,
196
+ "eval_samples_per_second": 87.083,
197
+ "eval_steps_per_second": 2.488,
198
+ "step": 38
199
+ },
200
+ {
201
+ "epoch": 20.0,
202
+ "learning_rate": 5e-05,
203
+ "loss": 0.9986,
204
+ "step": 40
205
+ },
206
+ {
207
+ "epoch": 20.0,
208
+ "eval_accuracy": 0.6571428571428571,
209
+ "eval_loss": 0.9513913989067078,
210
+ "eval_runtime": 0.6423,
211
+ "eval_samples_per_second": 108.98,
212
+ "eval_steps_per_second": 3.114,
213
+ "step": 40
214
+ },
215
+ {
216
+ "epoch": 21.0,
217
+ "eval_accuracy": 0.6571428571428571,
218
+ "eval_loss": 0.9493252635002136,
219
+ "eval_runtime": 0.6401,
220
+ "eval_samples_per_second": 109.357,
221
+ "eval_steps_per_second": 3.124,
222
+ "step": 42
223
+ },
224
+ {
225
+ "epoch": 22.0,
226
+ "eval_accuracy": 0.6428571428571429,
227
+ "eval_loss": 0.9471749663352966,
228
+ "eval_runtime": 0.7957,
229
+ "eval_samples_per_second": 87.97,
230
+ "eval_steps_per_second": 2.513,
231
+ "step": 44
232
+ },
233
+ {
234
+ "epoch": 23.0,
235
+ "eval_accuracy": 0.6428571428571429,
236
+ "eval_loss": 0.9451875686645508,
237
+ "eval_runtime": 0.6379,
238
+ "eval_samples_per_second": 109.728,
239
+ "eval_steps_per_second": 3.135,
240
+ "step": 46
241
+ },
242
+ {
243
+ "epoch": 24.0,
244
+ "eval_accuracy": 0.6428571428571429,
245
+ "eval_loss": 0.943417489528656,
246
+ "eval_runtime": 0.6466,
247
+ "eval_samples_per_second": 108.259,
248
+ "eval_steps_per_second": 3.093,
249
+ "step": 48
250
+ },
251
+ {
252
+ "epoch": 25.0,
253
+ "learning_rate": 6.25e-05,
254
+ "loss": 0.9973,
255
+ "step": 50
256
+ },
257
+ {
258
+ "epoch": 25.0,
259
+ "eval_accuracy": 0.6428571428571429,
260
+ "eval_loss": 0.9419717788696289,
261
+ "eval_runtime": 0.8115,
262
+ "eval_samples_per_second": 86.264,
263
+ "eval_steps_per_second": 2.465,
264
+ "step": 50
265
+ },
266
+ {
267
+ "epoch": 26.0,
268
+ "eval_accuracy": 0.6428571428571429,
269
+ "eval_loss": 0.9404588937759399,
270
+ "eval_runtime": 0.6332,
271
+ "eval_samples_per_second": 110.551,
272
+ "eval_steps_per_second": 3.159,
273
+ "step": 52
274
+ },
275
+ {
276
+ "epoch": 27.0,
277
+ "eval_accuracy": 0.6285714285714286,
278
+ "eval_loss": 0.9387302994728088,
279
+ "eval_runtime": 0.64,
280
+ "eval_samples_per_second": 109.375,
281
+ "eval_steps_per_second": 3.125,
282
+ "step": 54
283
+ },
284
+ {
285
+ "epoch": 28.0,
286
+ "eval_accuracy": 0.6285714285714286,
287
+ "eval_loss": 0.9375677704811096,
288
+ "eval_runtime": 0.8312,
289
+ "eval_samples_per_second": 84.219,
290
+ "eval_steps_per_second": 2.406,
291
+ "step": 56
292
+ },
293
+ {
294
+ "epoch": 29.0,
295
+ "eval_accuracy": 0.6428571428571429,
296
+ "eval_loss": 0.9368333220481873,
297
+ "eval_runtime": 0.6385,
298
+ "eval_samples_per_second": 109.629,
299
+ "eval_steps_per_second": 3.132,
300
+ "step": 58
301
+ },
302
+ {
303
+ "epoch": 30.0,
304
+ "learning_rate": 7.500000000000001e-05,
305
+ "loss": 0.9936,
306
+ "step": 60
307
+ },
308
+ {
309
+ "epoch": 30.0,
310
+ "eval_accuracy": 0.6428571428571429,
311
+ "eval_loss": 0.9361710548400879,
312
+ "eval_runtime": 0.6573,
313
+ "eval_samples_per_second": 106.497,
314
+ "eval_steps_per_second": 3.043,
315
+ "step": 60
316
+ },
317
+ {
318
+ "epoch": 31.0,
319
+ "eval_accuracy": 0.6571428571428571,
320
+ "eval_loss": 0.9361298680305481,
321
+ "eval_runtime": 0.7944,
322
+ "eval_samples_per_second": 88.115,
323
+ "eval_steps_per_second": 2.518,
324
+ "step": 62
325
+ },
326
+ {
327
+ "epoch": 32.0,
328
+ "eval_accuracy": 0.6714285714285714,
329
+ "eval_loss": 0.9364449381828308,
330
+ "eval_runtime": 0.6554,
331
+ "eval_samples_per_second": 106.808,
332
+ "eval_steps_per_second": 3.052,
333
+ "step": 64
334
+ },
335
+ {
336
+ "epoch": 33.0,
337
+ "eval_accuracy": 0.6714285714285714,
338
+ "eval_loss": 0.9371016621589661,
339
+ "eval_runtime": 0.6483,
340
+ "eval_samples_per_second": 107.97,
341
+ "eval_steps_per_second": 3.085,
342
+ "step": 66
343
+ },
344
+ {
345
+ "epoch": 34.0,
346
+ "eval_accuracy": 0.6428571428571429,
347
+ "eval_loss": 0.9379546046257019,
348
+ "eval_runtime": 0.8119,
349
+ "eval_samples_per_second": 86.219,
350
+ "eval_steps_per_second": 2.463,
351
+ "step": 68
352
+ },
353
+ {
354
+ "epoch": 35.0,
355
+ "learning_rate": 8.75e-05,
356
+ "loss": 0.9746,
357
+ "step": 70
358
+ },
359
+ {
360
+ "epoch": 35.0,
361
+ "eval_accuracy": 0.6571428571428571,
362
+ "eval_loss": 0.9379692077636719,
363
+ "eval_runtime": 0.6362,
364
+ "eval_samples_per_second": 110.031,
365
+ "eval_steps_per_second": 3.144,
366
+ "step": 70
367
+ },
368
+ {
369
+ "epoch": 36.0,
370
+ "eval_accuracy": 0.6714285714285714,
371
+ "eval_loss": 0.9374780654907227,
372
+ "eval_runtime": 0.639,
373
+ "eval_samples_per_second": 109.543,
374
+ "eval_steps_per_second": 3.13,
375
+ "step": 72
376
+ },
377
+ {
378
+ "epoch": 37.0,
379
+ "eval_accuracy": 0.6714285714285714,
380
+ "eval_loss": 0.9379698634147644,
381
+ "eval_runtime": 0.8343,
382
+ "eval_samples_per_second": 83.899,
383
+ "eval_steps_per_second": 2.397,
384
+ "step": 74
385
+ },
386
+ {
387
+ "epoch": 38.0,
388
+ "eval_accuracy": 0.6714285714285714,
389
+ "eval_loss": 0.9375231862068176,
390
+ "eval_runtime": 0.6395,
391
+ "eval_samples_per_second": 109.457,
392
+ "eval_steps_per_second": 3.127,
393
+ "step": 76
394
+ },
395
+ {
396
+ "epoch": 39.0,
397
+ "eval_accuracy": 0.6714285714285714,
398
+ "eval_loss": 0.9369739890098572,
399
+ "eval_runtime": 0.6333,
400
+ "eval_samples_per_second": 110.536,
401
+ "eval_steps_per_second": 3.158,
402
+ "step": 78
403
+ },
404
+ {
405
+ "epoch": 40.0,
406
+ "learning_rate": 0.0001,
407
+ "loss": 1.0113,
408
+ "step": 80
409
+ },
410
+ {
411
+ "epoch": 40.0,
412
+ "eval_accuracy": 0.6714285714285714,
413
+ "eval_loss": 0.9361743330955505,
414
+ "eval_runtime": 0.7993,
415
+ "eval_samples_per_second": 87.579,
416
+ "eval_steps_per_second": 2.502,
417
+ "step": 80
418
+ },
419
+ {
420
+ "epoch": 41.0,
421
+ "eval_accuracy": 0.6714285714285714,
422
+ "eval_loss": 0.9340663552284241,
423
+ "eval_runtime": 0.6461,
424
+ "eval_samples_per_second": 108.348,
425
+ "eval_steps_per_second": 3.096,
426
+ "step": 82
427
+ },
428
+ {
429
+ "epoch": 42.0,
430
+ "eval_accuracy": 0.6857142857142857,
431
+ "eval_loss": 0.9300563335418701,
432
+ "eval_runtime": 0.636,
433
+ "eval_samples_per_second": 110.058,
434
+ "eval_steps_per_second": 3.145,
435
+ "step": 84
436
+ },
437
+ {
438
+ "epoch": 43.0,
439
+ "eval_accuracy": 0.6714285714285714,
440
+ "eval_loss": 0.9259787201881409,
441
+ "eval_runtime": 0.8154,
442
+ "eval_samples_per_second": 85.845,
443
+ "eval_steps_per_second": 2.453,
444
+ "step": 86
445
+ },
446
+ {
447
+ "epoch": 44.0,
448
+ "eval_accuracy": 0.6571428571428571,
449
+ "eval_loss": 0.9224489331245422,
450
+ "eval_runtime": 0.6369,
451
+ "eval_samples_per_second": 109.903,
452
+ "eval_steps_per_second": 3.14,
453
+ "step": 88
454
+ },
455
+ {
456
+ "epoch": 45.0,
457
+ "learning_rate": 9.687500000000001e-05,
458
+ "loss": 0.9756,
459
+ "step": 90
460
+ },
461
+ {
462
+ "epoch": 45.0,
463
+ "eval_accuracy": 0.6714285714285714,
464
+ "eval_loss": 0.9190067648887634,
465
+ "eval_runtime": 0.6388,
466
+ "eval_samples_per_second": 109.577,
467
+ "eval_steps_per_second": 3.131,
468
+ "step": 90
469
+ },
470
+ {
471
+ "epoch": 46.0,
472
+ "eval_accuracy": 0.6714285714285714,
473
+ "eval_loss": 0.9154108166694641,
474
+ "eval_runtime": 0.7966,
475
+ "eval_samples_per_second": 87.873,
476
+ "eval_steps_per_second": 2.511,
477
+ "step": 92
478
+ },
479
+ {
480
+ "epoch": 47.0,
481
+ "eval_accuracy": 0.6714285714285714,
482
+ "eval_loss": 0.912346363067627,
483
+ "eval_runtime": 0.6406,
484
+ "eval_samples_per_second": 109.268,
485
+ "eval_steps_per_second": 3.122,
486
+ "step": 94
487
+ },
488
+ {
489
+ "epoch": 48.0,
490
+ "eval_accuracy": 0.6571428571428571,
491
+ "eval_loss": 0.9091367721557617,
492
+ "eval_runtime": 0.6398,
493
+ "eval_samples_per_second": 109.41,
494
+ "eval_steps_per_second": 3.126,
495
+ "step": 96
496
+ },
497
+ {
498
+ "epoch": 49.0,
499
+ "eval_accuracy": 0.6571428571428571,
500
+ "eval_loss": 0.9070726037025452,
501
+ "eval_runtime": 0.8188,
502
+ "eval_samples_per_second": 85.488,
503
+ "eval_steps_per_second": 2.443,
504
+ "step": 98
505
+ },
506
+ {
507
+ "epoch": 50.0,
508
+ "learning_rate": 9.375e-05,
509
+ "loss": 0.9721,
510
+ "step": 100
511
+ },
512
+ {
513
+ "epoch": 50.0,
514
+ "eval_accuracy": 0.6571428571428571,
515
+ "eval_loss": 0.9055730700492859,
516
+ "eval_runtime": 0.6361,
517
+ "eval_samples_per_second": 110.054,
518
+ "eval_steps_per_second": 3.144,
519
+ "step": 100
520
+ },
521
+ {
522
+ "epoch": 51.0,
523
+ "eval_accuracy": 0.6571428571428571,
524
+ "eval_loss": 0.9046576619148254,
525
+ "eval_runtime": 0.6407,
526
+ "eval_samples_per_second": 109.252,
527
+ "eval_steps_per_second": 3.121,
528
+ "step": 102
529
+ },
530
+ {
531
+ "epoch": 52.0,
532
+ "eval_accuracy": 0.6571428571428571,
533
+ "eval_loss": 0.9038794636726379,
534
+ "eval_runtime": 0.8178,
535
+ "eval_samples_per_second": 85.592,
536
+ "eval_steps_per_second": 2.445,
537
+ "step": 104
538
+ },
539
+ {
540
+ "epoch": 53.0,
541
+ "eval_accuracy": 0.6714285714285714,
542
+ "eval_loss": 0.9030665755271912,
543
+ "eval_runtime": 0.6283,
544
+ "eval_samples_per_second": 111.419,
545
+ "eval_steps_per_second": 3.183,
546
+ "step": 106
547
+ },
548
+ {
549
+ "epoch": 54.0,
550
+ "eval_accuracy": 0.6714285714285714,
551
+ "eval_loss": 0.902490496635437,
552
+ "eval_runtime": 0.8366,
553
+ "eval_samples_per_second": 83.669,
554
+ "eval_steps_per_second": 2.391,
555
+ "step": 108
556
+ },
557
+ {
558
+ "epoch": 55.0,
559
+ "learning_rate": 9.062500000000001e-05,
560
+ "loss": 0.9698,
561
+ "step": 110
562
+ },
563
+ {
564
+ "epoch": 55.0,
565
+ "eval_accuracy": 0.6714285714285714,
566
+ "eval_loss": 0.902264416217804,
567
+ "eval_runtime": 0.9891,
568
+ "eval_samples_per_second": 70.774,
569
+ "eval_steps_per_second": 2.022,
570
+ "step": 110
571
+ },
572
+ {
573
+ "epoch": 56.0,
574
+ "eval_accuracy": 0.6714285714285714,
575
+ "eval_loss": 0.9011555314064026,
576
+ "eval_runtime": 0.6498,
577
+ "eval_samples_per_second": 107.729,
578
+ "eval_steps_per_second": 3.078,
579
+ "step": 112
580
+ },
581
+ {
582
+ "epoch": 57.0,
583
+ "eval_accuracy": 0.6714285714285714,
584
+ "eval_loss": 0.8996686935424805,
585
+ "eval_runtime": 0.8289,
586
+ "eval_samples_per_second": 84.447,
587
+ "eval_steps_per_second": 2.413,
588
+ "step": 114
589
+ },
590
+ {
591
+ "epoch": 58.0,
592
+ "eval_accuracy": 0.6714285714285714,
593
+ "eval_loss": 0.8982025980949402,
594
+ "eval_runtime": 0.6375,
595
+ "eval_samples_per_second": 109.798,
596
+ "eval_steps_per_second": 3.137,
597
+ "step": 116
598
+ },
599
+ {
600
+ "epoch": 59.0,
601
+ "eval_accuracy": 0.6714285714285714,
602
+ "eval_loss": 0.8969982266426086,
603
+ "eval_runtime": 0.6483,
604
+ "eval_samples_per_second": 107.97,
605
+ "eval_steps_per_second": 3.085,
606
+ "step": 118
607
+ },
608
+ {
609
+ "epoch": 60.0,
610
+ "learning_rate": 8.75e-05,
611
+ "loss": 0.9341,
612
+ "step": 120
613
+ },
614
+ {
615
+ "epoch": 60.0,
616
+ "eval_accuracy": 0.6857142857142857,
617
+ "eval_loss": 0.8956836462020874,
618
+ "eval_runtime": 0.8303,
619
+ "eval_samples_per_second": 84.307,
620
+ "eval_steps_per_second": 2.409,
621
+ "step": 120
622
+ },
623
+ {
624
+ "epoch": 61.0,
625
+ "eval_accuracy": 0.7,
626
+ "eval_loss": 0.8946982622146606,
627
+ "eval_runtime": 0.6483,
628
+ "eval_samples_per_second": 107.981,
629
+ "eval_steps_per_second": 3.085,
630
+ "step": 122
631
+ },
632
+ {
633
+ "epoch": 62.0,
634
+ "eval_accuracy": 0.7,
635
+ "eval_loss": 0.8940390348434448,
636
+ "eval_runtime": 0.6421,
637
+ "eval_samples_per_second": 109.023,
638
+ "eval_steps_per_second": 3.115,
639
+ "step": 124
640
+ },
641
+ {
642
+ "epoch": 63.0,
643
+ "eval_accuracy": 0.6714285714285714,
644
+ "eval_loss": 0.8940520286560059,
645
+ "eval_runtime": 0.8356,
646
+ "eval_samples_per_second": 83.773,
647
+ "eval_steps_per_second": 2.394,
648
+ "step": 126
649
+ },
650
+ {
651
+ "epoch": 64.0,
652
+ "eval_accuracy": 0.6714285714285714,
653
+ "eval_loss": 0.8934383988380432,
654
+ "eval_runtime": 0.6317,
655
+ "eval_samples_per_second": 110.812,
656
+ "eval_steps_per_second": 3.166,
657
+ "step": 128
658
+ },
659
+ {
660
+ "epoch": 65.0,
661
+ "learning_rate": 8.4375e-05,
662
+ "loss": 0.9717,
663
+ "step": 130
664
+ },
665
+ {
666
+ "epoch": 65.0,
667
+ "eval_accuracy": 0.6714285714285714,
668
+ "eval_loss": 0.8916982412338257,
669
+ "eval_runtime": 0.6456,
670
+ "eval_samples_per_second": 108.418,
671
+ "eval_steps_per_second": 3.098,
672
+ "step": 130
673
+ },
674
+ {
675
+ "epoch": 66.0,
676
+ "eval_accuracy": 0.6857142857142857,
677
+ "eval_loss": 0.8898113369941711,
678
+ "eval_runtime": 0.8145,
679
+ "eval_samples_per_second": 85.937,
680
+ "eval_steps_per_second": 2.455,
681
+ "step": 132
682
+ },
683
+ {
684
+ "epoch": 67.0,
685
+ "eval_accuracy": 0.6857142857142857,
686
+ "eval_loss": 0.8883917927742004,
687
+ "eval_runtime": 0.6387,
688
+ "eval_samples_per_second": 109.599,
689
+ "eval_steps_per_second": 3.131,
690
+ "step": 134
691
+ },
692
+ {
693
+ "epoch": 68.0,
694
+ "eval_accuracy": 0.6857142857142857,
695
+ "eval_loss": 0.8869962692260742,
696
+ "eval_runtime": 0.6406,
697
+ "eval_samples_per_second": 109.266,
698
+ "eval_steps_per_second": 3.122,
699
+ "step": 136
700
+ },
701
+ {
702
+ "epoch": 69.0,
703
+ "eval_accuracy": 0.6857142857142857,
704
+ "eval_loss": 0.8853691816329956,
705
+ "eval_runtime": 0.8216,
706
+ "eval_samples_per_second": 85.2,
707
+ "eval_steps_per_second": 2.434,
708
+ "step": 138
709
+ },
710
+ {
711
+ "epoch": 70.0,
712
+ "learning_rate": 8.125000000000001e-05,
713
+ "loss": 0.9655,
714
+ "step": 140
715
+ },
716
+ {
717
+ "epoch": 70.0,
718
+ "eval_accuracy": 0.6857142857142857,
719
+ "eval_loss": 0.8840075731277466,
720
+ "eval_runtime": 0.6378,
721
+ "eval_samples_per_second": 109.751,
722
+ "eval_steps_per_second": 3.136,
723
+ "step": 140
724
+ },
725
+ {
726
+ "epoch": 71.0,
727
+ "eval_accuracy": 0.6857142857142857,
728
+ "eval_loss": 0.8826519250869751,
729
+ "eval_runtime": 0.6384,
730
+ "eval_samples_per_second": 109.644,
731
+ "eval_steps_per_second": 3.133,
732
+ "step": 142
733
+ },
734
+ {
735
+ "epoch": 72.0,
736
+ "eval_accuracy": 0.6857142857142857,
737
+ "eval_loss": 0.8813565373420715,
738
+ "eval_runtime": 0.8402,
739
+ "eval_samples_per_second": 83.313,
740
+ "eval_steps_per_second": 2.38,
741
+ "step": 144
742
+ },
743
+ {
744
+ "epoch": 73.0,
745
+ "eval_accuracy": 0.6857142857142857,
746
+ "eval_loss": 0.8805155754089355,
747
+ "eval_runtime": 0.6428,
748
+ "eval_samples_per_second": 108.905,
749
+ "eval_steps_per_second": 3.112,
750
+ "step": 146
751
+ },
752
+ {
753
+ "epoch": 74.0,
754
+ "eval_accuracy": 0.6857142857142857,
755
+ "eval_loss": 0.8803040385246277,
756
+ "eval_runtime": 0.649,
757
+ "eval_samples_per_second": 107.857,
758
+ "eval_steps_per_second": 3.082,
759
+ "step": 148
760
+ },
761
+ {
762
+ "epoch": 75.0,
763
+ "learning_rate": 7.8125e-05,
764
+ "loss": 0.9458,
765
+ "step": 150
766
+ },
767
+ {
768
+ "epoch": 75.0,
769
+ "eval_accuracy": 0.6857142857142857,
770
+ "eval_loss": 0.8801725506782532,
771
+ "eval_runtime": 0.82,
772
+ "eval_samples_per_second": 85.365,
773
+ "eval_steps_per_second": 2.439,
774
+ "step": 150
775
+ },
776
+ {
777
+ "epoch": 76.0,
778
+ "eval_accuracy": 0.6714285714285714,
779
+ "eval_loss": 0.8797475695610046,
780
+ "eval_runtime": 0.6476,
781
+ "eval_samples_per_second": 108.085,
782
+ "eval_steps_per_second": 3.088,
783
+ "step": 152
784
+ },
785
+ {
786
+ "epoch": 77.0,
787
+ "eval_accuracy": 0.6714285714285714,
788
+ "eval_loss": 0.8793725967407227,
789
+ "eval_runtime": 0.6468,
790
+ "eval_samples_per_second": 108.22,
791
+ "eval_steps_per_second": 3.092,
792
+ "step": 154
793
+ },
794
+ {
795
+ "epoch": 78.0,
796
+ "eval_accuracy": 0.6714285714285714,
797
+ "eval_loss": 0.8795827031135559,
798
+ "eval_runtime": 0.8346,
799
+ "eval_samples_per_second": 83.873,
800
+ "eval_steps_per_second": 2.396,
801
+ "step": 156
802
+ },
803
+ {
804
+ "epoch": 79.0,
805
+ "eval_accuracy": 0.6714285714285714,
806
+ "eval_loss": 0.8807878494262695,
807
+ "eval_runtime": 0.6453,
808
+ "eval_samples_per_second": 108.479,
809
+ "eval_steps_per_second": 3.099,
810
+ "step": 158
811
+ },
812
+ {
813
+ "epoch": 80.0,
814
+ "learning_rate": 7.500000000000001e-05,
815
+ "loss": 0.9094,
816
+ "step": 160
817
+ },
818
+ {
819
+ "epoch": 80.0,
820
+ "eval_accuracy": 0.6714285714285714,
821
+ "eval_loss": 0.8817013502120972,
822
+ "eval_runtime": 0.6393,
823
+ "eval_samples_per_second": 109.492,
824
+ "eval_steps_per_second": 3.128,
825
+ "step": 160
826
+ },
827
+ {
828
+ "epoch": 81.0,
829
+ "eval_accuracy": 0.6714285714285714,
830
+ "eval_loss": 0.8828238844871521,
831
+ "eval_runtime": 0.8346,
832
+ "eval_samples_per_second": 83.868,
833
+ "eval_steps_per_second": 2.396,
834
+ "step": 162
835
+ },
836
+ {
837
+ "epoch": 82.0,
838
+ "eval_accuracy": 0.6714285714285714,
839
+ "eval_loss": 0.8835611939430237,
840
+ "eval_runtime": 0.636,
841
+ "eval_samples_per_second": 110.07,
842
+ "eval_steps_per_second": 3.145,
843
+ "step": 164
844
+ },
845
+ {
846
+ "epoch": 83.0,
847
+ "eval_accuracy": 0.6714285714285714,
848
+ "eval_loss": 0.8830356001853943,
849
+ "eval_runtime": 0.6535,
850
+ "eval_samples_per_second": 107.117,
851
+ "eval_steps_per_second": 3.06,
852
+ "step": 166
853
+ },
854
+ {
855
+ "epoch": 84.0,
856
+ "eval_accuracy": 0.6571428571428571,
857
+ "eval_loss": 0.8820751905441284,
858
+ "eval_runtime": 0.8384,
859
+ "eval_samples_per_second": 83.495,
860
+ "eval_steps_per_second": 2.386,
861
+ "step": 168
862
+ },
863
+ {
864
+ "epoch": 85.0,
865
+ "learning_rate": 7.1875e-05,
866
+ "loss": 0.8719,
867
+ "step": 170
868
+ },
869
+ {
870
+ "epoch": 85.0,
871
+ "eval_accuracy": 0.6571428571428571,
872
+ "eval_loss": 0.8812506794929504,
873
+ "eval_runtime": 0.6519,
874
+ "eval_samples_per_second": 107.372,
875
+ "eval_steps_per_second": 3.068,
876
+ "step": 170
877
+ },
878
+ {
879
+ "epoch": 86.0,
880
+ "eval_accuracy": 0.6714285714285714,
881
+ "eval_loss": 0.8804309368133545,
882
+ "eval_runtime": 0.6326,
883
+ "eval_samples_per_second": 110.652,
884
+ "eval_steps_per_second": 3.161,
885
+ "step": 172
886
+ },
887
+ {
888
+ "epoch": 87.0,
889
+ "eval_accuracy": 0.6571428571428571,
890
+ "eval_loss": 0.8798118829727173,
891
+ "eval_runtime": 0.8338,
892
+ "eval_samples_per_second": 83.95,
893
+ "eval_steps_per_second": 2.399,
894
+ "step": 174
895
+ },
896
+ {
897
+ "epoch": 88.0,
898
+ "eval_accuracy": 0.6571428571428571,
899
+ "eval_loss": 0.8787184953689575,
900
+ "eval_runtime": 0.64,
901
+ "eval_samples_per_second": 109.38,
902
+ "eval_steps_per_second": 3.125,
903
+ "step": 176
904
+ },
905
+ {
906
+ "epoch": 89.0,
907
+ "eval_accuracy": 0.6571428571428571,
908
+ "eval_loss": 0.8769770264625549,
909
+ "eval_runtime": 0.6382,
910
+ "eval_samples_per_second": 109.679,
911
+ "eval_steps_per_second": 3.134,
912
+ "step": 178
913
+ },
914
+ {
915
+ "epoch": 90.0,
916
+ "learning_rate": 6.875e-05,
917
+ "loss": 0.9288,
918
+ "step": 180
919
+ },
920
+ {
921
+ "epoch": 90.0,
922
+ "eval_accuracy": 0.6857142857142857,
923
+ "eval_loss": 0.8752025961875916,
924
+ "eval_runtime": 0.8649,
925
+ "eval_samples_per_second": 80.934,
926
+ "eval_steps_per_second": 2.312,
927
+ "step": 180
928
+ },
929
+ {
930
+ "epoch": 91.0,
931
+ "eval_accuracy": 0.6857142857142857,
932
+ "eval_loss": 0.8721939921379089,
933
+ "eval_runtime": 0.6536,
934
+ "eval_samples_per_second": 107.101,
935
+ "eval_steps_per_second": 3.06,
936
+ "step": 182
937
+ },
938
+ {
939
+ "epoch": 92.0,
940
+ "eval_accuracy": 0.6714285714285714,
941
+ "eval_loss": 0.8693682551383972,
942
+ "eval_runtime": 0.6434,
943
+ "eval_samples_per_second": 108.799,
944
+ "eval_steps_per_second": 3.109,
945
+ "step": 184
946
+ },
947
+ {
948
+ "epoch": 93.0,
949
+ "eval_accuracy": 0.6714285714285714,
950
+ "eval_loss": 0.8670406937599182,
951
+ "eval_runtime": 0.8337,
952
+ "eval_samples_per_second": 83.963,
953
+ "eval_steps_per_second": 2.399,
954
+ "step": 186
955
+ },
956
+ {
957
+ "epoch": 94.0,
958
+ "eval_accuracy": 0.6857142857142857,
959
+ "eval_loss": 0.8644655346870422,
960
+ "eval_runtime": 0.6432,
961
+ "eval_samples_per_second": 108.826,
962
+ "eval_steps_per_second": 3.109,
963
+ "step": 188
964
+ },
965
+ {
966
+ "epoch": 95.0,
967
+ "learning_rate": 6.562500000000001e-05,
968
+ "loss": 0.9039,
969
+ "step": 190
970
+ },
971
+ {
972
+ "epoch": 95.0,
973
+ "eval_accuracy": 0.6857142857142857,
974
+ "eval_loss": 0.8624207973480225,
975
+ "eval_runtime": 0.6482,
976
+ "eval_samples_per_second": 107.999,
977
+ "eval_steps_per_second": 3.086,
978
+ "step": 190
979
+ },
980
+ {
981
+ "epoch": 96.0,
982
+ "eval_accuracy": 0.6714285714285714,
983
+ "eval_loss": 0.8603058457374573,
984
+ "eval_runtime": 0.8409,
985
+ "eval_samples_per_second": 83.249,
986
+ "eval_steps_per_second": 2.379,
987
+ "step": 192
988
+ },
989
+ {
990
+ "epoch": 97.0,
991
+ "eval_accuracy": 0.6857142857142857,
992
+ "eval_loss": 0.8583868741989136,
993
+ "eval_runtime": 0.6484,
994
+ "eval_samples_per_second": 107.951,
995
+ "eval_steps_per_second": 3.084,
996
+ "step": 194
997
+ },
998
+ {
999
+ "epoch": 98.0,
1000
+ "eval_accuracy": 0.6857142857142857,
1001
+ "eval_loss": 0.8566268086433411,
1002
+ "eval_runtime": 0.6949,
1003
+ "eval_samples_per_second": 100.728,
1004
+ "eval_steps_per_second": 2.878,
1005
+ "step": 196
1006
+ },
1007
+ {
1008
+ "epoch": 99.0,
1009
+ "eval_accuracy": 0.6857142857142857,
1010
+ "eval_loss": 0.8553413152694702,
1011
+ "eval_runtime": 0.8276,
1012
+ "eval_samples_per_second": 84.585,
1013
+ "eval_steps_per_second": 2.417,
1014
+ "step": 198
1015
+ },
1016
+ {
1017
+ "epoch": 100.0,
1018
+ "learning_rate": 6.25e-05,
1019
+ "loss": 0.9081,
1020
+ "step": 200
1021
+ },
1022
+ {
1023
+ "epoch": 100.0,
1024
+ "eval_accuracy": 0.6857142857142857,
1025
+ "eval_loss": 0.8549684286117554,
1026
+ "eval_runtime": 0.6594,
1027
+ "eval_samples_per_second": 106.164,
1028
+ "eval_steps_per_second": 3.033,
1029
+ "step": 200
1030
+ },
1031
+ {
1032
+ "epoch": 101.0,
1033
+ "eval_accuracy": 0.6857142857142857,
1034
+ "eval_loss": 0.8551309108734131,
1035
+ "eval_runtime": 0.6588,
1036
+ "eval_samples_per_second": 106.255,
1037
+ "eval_steps_per_second": 3.036,
1038
+ "step": 202
1039
+ },
1040
+ {
1041
+ "epoch": 102.0,
1042
+ "eval_accuracy": 0.6857142857142857,
1043
+ "eval_loss": 0.8556391000747681,
1044
+ "eval_runtime": 0.8474,
1045
+ "eval_samples_per_second": 82.605,
1046
+ "eval_steps_per_second": 2.36,
1047
+ "step": 204
1048
+ },
1049
+ {
1050
+ "epoch": 103.0,
1051
+ "eval_accuracy": 0.6857142857142857,
1052
+ "eval_loss": 0.8558002710342407,
1053
+ "eval_runtime": 0.6568,
1054
+ "eval_samples_per_second": 106.577,
1055
+ "eval_steps_per_second": 3.045,
1056
+ "step": 206
1057
+ },
1058
+ {
1059
+ "epoch": 104.0,
1060
+ "eval_accuracy": 0.6857142857142857,
1061
+ "eval_loss": 0.8554455637931824,
1062
+ "eval_runtime": 0.6448,
1063
+ "eval_samples_per_second": 108.569,
1064
+ "eval_steps_per_second": 3.102,
1065
+ "step": 208
1066
+ },
1067
+ {
1068
+ "epoch": 105.0,
1069
+ "learning_rate": 5.9375e-05,
1070
+ "loss": 0.9142,
1071
+ "step": 210
1072
+ },
1073
+ {
1074
+ "epoch": 105.0,
1075
+ "eval_accuracy": 0.6857142857142857,
1076
+ "eval_loss": 0.8551297783851624,
1077
+ "eval_runtime": 0.8226,
1078
+ "eval_samples_per_second": 85.093,
1079
+ "eval_steps_per_second": 2.431,
1080
+ "step": 210
1081
+ },
1082
+ {
1083
+ "epoch": 106.0,
1084
+ "eval_accuracy": 0.6857142857142857,
1085
+ "eval_loss": 0.8553109169006348,
1086
+ "eval_runtime": 0.6501,
1087
+ "eval_samples_per_second": 107.668,
1088
+ "eval_steps_per_second": 3.076,
1089
+ "step": 212
1090
+ },
1091
+ {
1092
+ "epoch": 107.0,
1093
+ "eval_accuracy": 0.6857142857142857,
1094
+ "eval_loss": 0.855134904384613,
1095
+ "eval_runtime": 0.637,
1096
+ "eval_samples_per_second": 109.882,
1097
+ "eval_steps_per_second": 3.139,
1098
+ "step": 214
1099
+ },
1100
+ {
1101
+ "epoch": 108.0,
1102
+ "eval_accuracy": 0.6857142857142857,
1103
+ "eval_loss": 0.8549013137817383,
1104
+ "eval_runtime": 0.8378,
1105
+ "eval_samples_per_second": 83.557,
1106
+ "eval_steps_per_second": 2.387,
1107
+ "step": 216
1108
+ },
1109
+ {
1110
+ "epoch": 109.0,
1111
+ "eval_accuracy": 0.6857142857142857,
1112
+ "eval_loss": 0.854942798614502,
1113
+ "eval_runtime": 0.6596,
1114
+ "eval_samples_per_second": 106.131,
1115
+ "eval_steps_per_second": 3.032,
1116
+ "step": 218
1117
+ },
1118
+ {
1119
+ "epoch": 110.0,
1120
+ "learning_rate": 5.6250000000000005e-05,
1121
+ "loss": 0.9347,
1122
+ "step": 220
1123
+ },
1124
+ {
1125
+ "epoch": 110.0,
1126
+ "eval_accuracy": 0.6714285714285714,
1127
+ "eval_loss": 0.8551362752914429,
1128
+ "eval_runtime": 0.6674,
1129
+ "eval_samples_per_second": 104.886,
1130
+ "eval_steps_per_second": 2.997,
1131
+ "step": 220
1132
+ },
1133
+ {
1134
+ "epoch": 111.0,
1135
+ "eval_accuracy": 0.6714285714285714,
1136
+ "eval_loss": 0.8553721308708191,
1137
+ "eval_runtime": 0.8336,
1138
+ "eval_samples_per_second": 83.974,
1139
+ "eval_steps_per_second": 2.399,
1140
+ "step": 222
1141
+ },
1142
+ {
1143
+ "epoch": 112.0,
1144
+ "eval_accuracy": 0.6714285714285714,
1145
+ "eval_loss": 0.8548364639282227,
1146
+ "eval_runtime": 0.6506,
1147
+ "eval_samples_per_second": 107.599,
1148
+ "eval_steps_per_second": 3.074,
1149
+ "step": 224
1150
+ },
1151
+ {
1152
+ "epoch": 113.0,
1153
+ "eval_accuracy": 0.6714285714285714,
1154
+ "eval_loss": 0.853795051574707,
1155
+ "eval_runtime": 0.6756,
1156
+ "eval_samples_per_second": 103.611,
1157
+ "eval_steps_per_second": 2.96,
1158
+ "step": 226
1159
+ },
1160
+ {
1161
+ "epoch": 114.0,
1162
+ "eval_accuracy": 0.6714285714285714,
1163
+ "eval_loss": 0.8524832129478455,
1164
+ "eval_runtime": 0.8168,
1165
+ "eval_samples_per_second": 85.696,
1166
+ "eval_steps_per_second": 2.448,
1167
+ "step": 228
1168
+ },
1169
+ {
1170
+ "epoch": 115.0,
1171
+ "learning_rate": 5.3125000000000004e-05,
1172
+ "loss": 0.8922,
1173
+ "step": 230
1174
+ },
1175
+ {
1176
+ "epoch": 115.0,
1177
+ "eval_accuracy": 0.6857142857142857,
1178
+ "eval_loss": 0.8512247204780579,
1179
+ "eval_runtime": 0.6476,
1180
+ "eval_samples_per_second": 108.096,
1181
+ "eval_steps_per_second": 3.088,
1182
+ "step": 230
1183
+ },
1184
+ {
1185
+ "epoch": 116.0,
1186
+ "eval_accuracy": 0.6857142857142857,
1187
+ "eval_loss": 0.8505221009254456,
1188
+ "eval_runtime": 0.6563,
1189
+ "eval_samples_per_second": 106.655,
1190
+ "eval_steps_per_second": 3.047,
1191
+ "step": 232
1192
+ },
1193
+ {
1194
+ "epoch": 117.0,
1195
+ "eval_accuracy": 0.6857142857142857,
1196
+ "eval_loss": 0.849509596824646,
1197
+ "eval_runtime": 0.8193,
1198
+ "eval_samples_per_second": 85.434,
1199
+ "eval_steps_per_second": 2.441,
1200
+ "step": 234
1201
+ },
1202
+ {
1203
+ "epoch": 118.0,
1204
+ "eval_accuracy": 0.6857142857142857,
1205
+ "eval_loss": 0.8483795523643494,
1206
+ "eval_runtime": 0.6476,
1207
+ "eval_samples_per_second": 108.094,
1208
+ "eval_steps_per_second": 3.088,
1209
+ "step": 236
1210
+ },
1211
+ {
1212
+ "epoch": 119.0,
1213
+ "eval_accuracy": 0.6857142857142857,
1214
+ "eval_loss": 0.8471851944923401,
1215
+ "eval_runtime": 0.6472,
1216
+ "eval_samples_per_second": 108.158,
1217
+ "eval_steps_per_second": 3.09,
1218
+ "step": 238
1219
+ },
1220
+ {
1221
+ "epoch": 120.0,
1222
+ "learning_rate": 5e-05,
1223
+ "loss": 0.8897,
1224
+ "step": 240
1225
+ },
1226
+ {
1227
+ "epoch": 120.0,
1228
+ "eval_accuracy": 0.6857142857142857,
1229
+ "eval_loss": 0.8455559611320496,
1230
+ "eval_runtime": 0.8155,
1231
+ "eval_samples_per_second": 85.837,
1232
+ "eval_steps_per_second": 2.452,
1233
+ "step": 240
1234
+ },
1235
+ {
1236
+ "epoch": 121.0,
1237
+ "eval_accuracy": 0.6857142857142857,
1238
+ "eval_loss": 0.8439861536026001,
1239
+ "eval_runtime": 0.6794,
1240
+ "eval_samples_per_second": 103.026,
1241
+ "eval_steps_per_second": 2.944,
1242
+ "step": 242
1243
+ },
1244
+ {
1245
+ "epoch": 122.0,
1246
+ "eval_accuracy": 0.6714285714285714,
1247
+ "eval_loss": 0.8426181674003601,
1248
+ "eval_runtime": 0.6386,
1249
+ "eval_samples_per_second": 109.616,
1250
+ "eval_steps_per_second": 3.132,
1251
+ "step": 244
1252
+ },
1253
+ {
1254
+ "epoch": 123.0,
1255
+ "eval_accuracy": 0.6857142857142857,
1256
+ "eval_loss": 0.8412323594093323,
1257
+ "eval_runtime": 0.8222,
1258
+ "eval_samples_per_second": 85.135,
1259
+ "eval_steps_per_second": 2.432,
1260
+ "step": 246
1261
+ },
1262
+ {
1263
+ "epoch": 124.0,
1264
+ "eval_accuracy": 0.6857142857142857,
1265
+ "eval_loss": 0.8395997881889343,
1266
+ "eval_runtime": 0.6405,
1267
+ "eval_samples_per_second": 109.29,
1268
+ "eval_steps_per_second": 3.123,
1269
+ "step": 248
1270
+ },
1271
+ {
1272
+ "epoch": 125.0,
1273
+ "learning_rate": 4.6875e-05,
1274
+ "loss": 0.8829,
1275
+ "step": 250
1276
+ },
1277
+ {
1278
+ "epoch": 125.0,
1279
+ "eval_accuracy": 0.6857142857142857,
1280
+ "eval_loss": 0.8383906483650208,
1281
+ "eval_runtime": 0.6384,
1282
+ "eval_samples_per_second": 109.656,
1283
+ "eval_steps_per_second": 3.133,
1284
+ "step": 250
1285
+ },
1286
+ {
1287
+ "epoch": 126.0,
1288
+ "eval_accuracy": 0.6857142857142857,
1289
+ "eval_loss": 0.8372732996940613,
1290
+ "eval_runtime": 0.8007,
1291
+ "eval_samples_per_second": 87.425,
1292
+ "eval_steps_per_second": 2.498,
1293
+ "step": 252
1294
+ },
1295
+ {
1296
+ "epoch": 127.0,
1297
+ "eval_accuracy": 0.6857142857142857,
1298
+ "eval_loss": 0.8365365266799927,
1299
+ "eval_runtime": 0.6412,
1300
+ "eval_samples_per_second": 109.171,
1301
+ "eval_steps_per_second": 3.119,
1302
+ "step": 254
1303
+ },
1304
+ {
1305
+ "epoch": 128.0,
1306
+ "eval_accuracy": 0.6857142857142857,
1307
+ "eval_loss": 0.835951030254364,
1308
+ "eval_runtime": 0.6518,
1309
+ "eval_samples_per_second": 107.389,
1310
+ "eval_steps_per_second": 3.068,
1311
+ "step": 256
1312
+ },
1313
+ {
1314
+ "epoch": 129.0,
1315
+ "eval_accuracy": 0.6857142857142857,
1316
+ "eval_loss": 0.8352962732315063,
1317
+ "eval_runtime": 0.8209,
1318
+ "eval_samples_per_second": 85.273,
1319
+ "eval_steps_per_second": 2.436,
1320
+ "step": 258
1321
+ },
1322
+ {
1323
+ "epoch": 130.0,
1324
+ "learning_rate": 4.375e-05,
1325
+ "loss": 0.8744,
1326
+ "step": 260
1327
+ },
1328
+ {
1329
+ "epoch": 130.0,
1330
+ "eval_accuracy": 0.6857142857142857,
1331
+ "eval_loss": 0.8344349265098572,
1332
+ "eval_runtime": 0.6608,
1333
+ "eval_samples_per_second": 105.932,
1334
+ "eval_steps_per_second": 3.027,
1335
+ "step": 260
1336
+ },
1337
+ {
1338
+ "epoch": 131.0,
1339
+ "eval_accuracy": 0.6714285714285714,
1340
+ "eval_loss": 0.8336659669876099,
1341
+ "eval_runtime": 0.6503,
1342
+ "eval_samples_per_second": 107.635,
1343
+ "eval_steps_per_second": 3.075,
1344
+ "step": 262
1345
+ },
1346
+ {
1347
+ "epoch": 132.0,
1348
+ "eval_accuracy": 0.6857142857142857,
1349
+ "eval_loss": 0.8329463601112366,
1350
+ "eval_runtime": 0.824,
1351
+ "eval_samples_per_second": 84.952,
1352
+ "eval_steps_per_second": 2.427,
1353
+ "step": 264
1354
+ },
1355
+ {
1356
+ "epoch": 133.0,
1357
+ "eval_accuracy": 0.6857142857142857,
1358
+ "eval_loss": 0.8324605822563171,
1359
+ "eval_runtime": 0.6594,
1360
+ "eval_samples_per_second": 106.156,
1361
+ "eval_steps_per_second": 3.033,
1362
+ "step": 266
1363
+ },
1364
+ {
1365
+ "epoch": 134.0,
1366
+ "eval_accuracy": 0.6857142857142857,
1367
+ "eval_loss": 0.8318061232566833,
1368
+ "eval_runtime": 0.6395,
1369
+ "eval_samples_per_second": 109.457,
1370
+ "eval_steps_per_second": 3.127,
1371
+ "step": 268
1372
+ },
1373
+ {
1374
+ "epoch": 135.0,
1375
+ "learning_rate": 4.0625000000000005e-05,
1376
+ "loss": 0.8657,
1377
+ "step": 270
1378
+ },
1379
+ {
1380
+ "epoch": 135.0,
1381
+ "eval_accuracy": 0.6857142857142857,
1382
+ "eval_loss": 0.8312056660652161,
1383
+ "eval_runtime": 0.8064,
1384
+ "eval_samples_per_second": 86.802,
1385
+ "eval_steps_per_second": 2.48,
1386
+ "step": 270
1387
+ },
1388
+ {
1389
+ "epoch": 136.0,
1390
+ "eval_accuracy": 0.6714285714285714,
1391
+ "eval_loss": 0.8306312561035156,
1392
+ "eval_runtime": 0.645,
1393
+ "eval_samples_per_second": 108.533,
1394
+ "eval_steps_per_second": 3.101,
1395
+ "step": 272
1396
+ },
1397
+ {
1398
+ "epoch": 137.0,
1399
+ "eval_accuracy": 0.6714285714285714,
1400
+ "eval_loss": 0.8299986720085144,
1401
+ "eval_runtime": 0.6678,
1402
+ "eval_samples_per_second": 104.823,
1403
+ "eval_steps_per_second": 2.995,
1404
+ "step": 274
1405
+ },
1406
+ {
1407
+ "epoch": 138.0,
1408
+ "eval_accuracy": 0.6714285714285714,
1409
+ "eval_loss": 0.8296393752098083,
1410
+ "eval_runtime": 0.8159,
1411
+ "eval_samples_per_second": 85.792,
1412
+ "eval_steps_per_second": 2.451,
1413
+ "step": 276
1414
+ },
1415
+ {
1416
+ "epoch": 139.0,
1417
+ "eval_accuracy": 0.6714285714285714,
1418
+ "eval_loss": 0.8294458389282227,
1419
+ "eval_runtime": 0.6396,
1420
+ "eval_samples_per_second": 109.442,
1421
+ "eval_steps_per_second": 3.127,
1422
+ "step": 278
1423
+ },
1424
+ {
1425
+ "epoch": 140.0,
1426
+ "learning_rate": 3.7500000000000003e-05,
1427
+ "loss": 0.9421,
1428
+ "step": 280
1429
+ },
1430
+ {
1431
+ "epoch": 140.0,
1432
+ "eval_accuracy": 0.6714285714285714,
1433
+ "eval_loss": 0.8292441368103027,
1434
+ "eval_runtime": 0.6515,
1435
+ "eval_samples_per_second": 107.445,
1436
+ "eval_steps_per_second": 3.07,
1437
+ "step": 280
1438
+ },
1439
+ {
1440
+ "epoch": 141.0,
1441
+ "eval_accuracy": 0.6714285714285714,
1442
+ "eval_loss": 0.8291121125221252,
1443
+ "eval_runtime": 0.8194,
1444
+ "eval_samples_per_second": 85.428,
1445
+ "eval_steps_per_second": 2.441,
1446
+ "step": 282
1447
+ },
1448
+ {
1449
+ "epoch": 142.0,
1450
+ "eval_accuracy": 0.6714285714285714,
1451
+ "eval_loss": 0.8290067315101624,
1452
+ "eval_runtime": 0.9452,
1453
+ "eval_samples_per_second": 74.057,
1454
+ "eval_steps_per_second": 2.116,
1455
+ "step": 284
1456
+ },
1457
+ {
1458
+ "epoch": 143.0,
1459
+ "eval_accuracy": 0.6857142857142857,
1460
+ "eval_loss": 0.8290221095085144,
1461
+ "eval_runtime": 0.6854,
1462
+ "eval_samples_per_second": 102.129,
1463
+ "eval_steps_per_second": 2.918,
1464
+ "step": 286
1465
+ },
1466
+ {
1467
+ "epoch": 144.0,
1468
+ "eval_accuracy": 0.6857142857142857,
1469
+ "eval_loss": 0.8288514018058777,
1470
+ "eval_runtime": 0.6741,
1471
+ "eval_samples_per_second": 103.846,
1472
+ "eval_steps_per_second": 2.967,
1473
+ "step": 288
1474
+ },
1475
+ {
1476
+ "epoch": 145.0,
1477
+ "learning_rate": 3.4375e-05,
1478
+ "loss": 0.9066,
1479
+ "step": 290
1480
+ },
1481
+ {
1482
+ "epoch": 145.0,
1483
+ "eval_accuracy": 0.6857142857142857,
1484
+ "eval_loss": 0.8286876082420349,
1485
+ "eval_runtime": 0.6545,
1486
+ "eval_samples_per_second": 106.944,
1487
+ "eval_steps_per_second": 3.056,
1488
+ "step": 290
1489
+ },
1490
+ {
1491
+ "epoch": 146.0,
1492
+ "eval_accuracy": 0.6857142857142857,
1493
+ "eval_loss": 0.8290360569953918,
1494
+ "eval_runtime": 0.6611,
1495
+ "eval_samples_per_second": 105.889,
1496
+ "eval_steps_per_second": 3.025,
1497
+ "step": 292
1498
+ },
1499
+ {
1500
+ "epoch": 147.0,
1501
+ "eval_accuracy": 0.6857142857142857,
1502
+ "eval_loss": 0.8293396830558777,
1503
+ "eval_runtime": 0.6543,
1504
+ "eval_samples_per_second": 106.98,
1505
+ "eval_steps_per_second": 3.057,
1506
+ "step": 294
1507
+ },
1508
+ {
1509
+ "epoch": 148.0,
1510
+ "eval_accuracy": 0.6857142857142857,
1511
+ "eval_loss": 0.8294445872306824,
1512
+ "eval_runtime": 0.6455,
1513
+ "eval_samples_per_second": 108.45,
1514
+ "eval_steps_per_second": 3.099,
1515
+ "step": 296
1516
+ },
1517
+ {
1518
+ "epoch": 149.0,
1519
+ "eval_accuracy": 0.6857142857142857,
1520
+ "eval_loss": 0.8294763565063477,
1521
+ "eval_runtime": 0.9727,
1522
+ "eval_samples_per_second": 71.966,
1523
+ "eval_steps_per_second": 2.056,
1524
+ "step": 298
1525
+ },
1526
+ {
1527
+ "epoch": 150.0,
1528
+ "learning_rate": 3.125e-05,
1529
+ "loss": 0.9068,
1530
+ "step": 300
1531
+ },
1532
+ {
1533
+ "epoch": 150.0,
1534
+ "eval_accuracy": 0.6857142857142857,
1535
+ "eval_loss": 0.8295239210128784,
1536
+ "eval_runtime": 0.9775,
1537
+ "eval_samples_per_second": 71.611,
1538
+ "eval_steps_per_second": 2.046,
1539
+ "step": 300
1540
+ },
1541
+ {
1542
+ "epoch": 151.0,
1543
+ "eval_accuracy": 0.6857142857142857,
1544
+ "eval_loss": 0.8294230699539185,
1545
+ "eval_runtime": 0.6644,
1546
+ "eval_samples_per_second": 105.363,
1547
+ "eval_steps_per_second": 3.01,
1548
+ "step": 302
1549
+ },
1550
+ {
1551
+ "epoch": 152.0,
1552
+ "eval_accuracy": 0.6857142857142857,
1553
+ "eval_loss": 0.829305112361908,
1554
+ "eval_runtime": 0.6604,
1555
+ "eval_samples_per_second": 105.994,
1556
+ "eval_steps_per_second": 3.028,
1557
+ "step": 304
1558
+ },
1559
+ {
1560
+ "epoch": 153.0,
1561
+ "eval_accuracy": 0.6857142857142857,
1562
+ "eval_loss": 0.8293172717094421,
1563
+ "eval_runtime": 0.8353,
1564
+ "eval_samples_per_second": 83.803,
1565
+ "eval_steps_per_second": 2.394,
1566
+ "step": 306
1567
+ },
1568
+ {
1569
+ "epoch": 154.0,
1570
+ "eval_accuracy": 0.6857142857142857,
1571
+ "eval_loss": 0.8289957046508789,
1572
+ "eval_runtime": 0.6575,
1573
+ "eval_samples_per_second": 106.469,
1574
+ "eval_steps_per_second": 3.042,
1575
+ "step": 308
1576
+ },
1577
+ {
1578
+ "epoch": 155.0,
1579
+ "learning_rate": 2.8125000000000003e-05,
1580
+ "loss": 0.8715,
1581
+ "step": 310
1582
+ },
1583
+ {
1584
+ "epoch": 155.0,
1585
+ "eval_accuracy": 0.6857142857142857,
1586
+ "eval_loss": 0.8286699056625366,
1587
+ "eval_runtime": 0.6466,
1588
+ "eval_samples_per_second": 108.266,
1589
+ "eval_steps_per_second": 3.093,
1590
+ "step": 310
1591
+ },
1592
+ {
1593
+ "epoch": 156.0,
1594
+ "eval_accuracy": 0.6857142857142857,
1595
+ "eval_loss": 0.8283028602600098,
1596
+ "eval_runtime": 0.8251,
1597
+ "eval_samples_per_second": 84.843,
1598
+ "eval_steps_per_second": 2.424,
1599
+ "step": 312
1600
+ },
1601
+ {
1602
+ "epoch": 157.0,
1603
+ "eval_accuracy": 0.6857142857142857,
1604
+ "eval_loss": 0.8276944160461426,
1605
+ "eval_runtime": 0.6461,
1606
+ "eval_samples_per_second": 108.335,
1607
+ "eval_steps_per_second": 3.095,
1608
+ "step": 314
1609
+ },
1610
+ {
1611
+ "epoch": 158.0,
1612
+ "eval_accuracy": 0.6857142857142857,
1613
+ "eval_loss": 0.827368438243866,
1614
+ "eval_runtime": 0.6771,
1615
+ "eval_samples_per_second": 103.379,
1616
+ "eval_steps_per_second": 2.954,
1617
+ "step": 316
1618
+ },
1619
+ {
1620
+ "epoch": 159.0,
1621
+ "eval_accuracy": 0.6857142857142857,
1622
+ "eval_loss": 0.8269255757331848,
1623
+ "eval_runtime": 0.8454,
1624
+ "eval_samples_per_second": 82.804,
1625
+ "eval_steps_per_second": 2.366,
1626
+ "step": 318
1627
+ },
1628
+ {
1629
+ "epoch": 160.0,
1630
+ "learning_rate": 2.5e-05,
1631
+ "loss": 0.8921,
1632
+ "step": 320
1633
+ },
1634
+ {
1635
+ "epoch": 160.0,
1636
+ "eval_accuracy": 0.6857142857142857,
1637
+ "eval_loss": 0.826560914516449,
1638
+ "eval_runtime": 0.6462,
1639
+ "eval_samples_per_second": 108.325,
1640
+ "eval_steps_per_second": 3.095,
1641
+ "step": 320
1642
+ },
1643
+ {
1644
+ "epoch": 161.0,
1645
+ "eval_accuracy": 0.6857142857142857,
1646
+ "eval_loss": 0.8263527154922485,
1647
+ "eval_runtime": 0.6718,
1648
+ "eval_samples_per_second": 104.193,
1649
+ "eval_steps_per_second": 2.977,
1650
+ "step": 322
1651
+ },
1652
+ {
1653
+ "epoch": 162.0,
1654
+ "eval_accuracy": 0.6857142857142857,
1655
+ "eval_loss": 0.826131284236908,
1656
+ "eval_runtime": 0.8359,
1657
+ "eval_samples_per_second": 83.747,
1658
+ "eval_steps_per_second": 2.393,
1659
+ "step": 324
1660
+ },
1661
+ {
1662
+ "epoch": 163.0,
1663
+ "eval_accuracy": 0.6857142857142857,
1664
+ "eval_loss": 0.8259814977645874,
1665
+ "eval_runtime": 0.6618,
1666
+ "eval_samples_per_second": 105.778,
1667
+ "eval_steps_per_second": 3.022,
1668
+ "step": 326
1669
+ },
1670
+ {
1671
+ "epoch": 164.0,
1672
+ "eval_accuracy": 0.6857142857142857,
1673
+ "eval_loss": 0.8257696032524109,
1674
+ "eval_runtime": 0.6625,
1675
+ "eval_samples_per_second": 105.664,
1676
+ "eval_steps_per_second": 3.019,
1677
+ "step": 328
1678
+ },
1679
+ {
1680
+ "epoch": 165.0,
1681
+ "learning_rate": 2.1875e-05,
1682
+ "loss": 0.8768,
1683
+ "step": 330
1684
+ },
1685
+ {
1686
+ "epoch": 165.0,
1687
+ "eval_accuracy": 0.6857142857142857,
1688
+ "eval_loss": 0.825222373008728,
1689
+ "eval_runtime": 0.8436,
1690
+ "eval_samples_per_second": 82.974,
1691
+ "eval_steps_per_second": 2.371,
1692
+ "step": 330
1693
+ },
1694
+ {
1695
+ "epoch": 166.0,
1696
+ "eval_accuracy": 0.6857142857142857,
1697
+ "eval_loss": 0.8247527480125427,
1698
+ "eval_runtime": 0.6665,
1699
+ "eval_samples_per_second": 105.023,
1700
+ "eval_steps_per_second": 3.001,
1701
+ "step": 332
1702
+ },
1703
+ {
1704
+ "epoch": 167.0,
1705
+ "eval_accuracy": 0.6857142857142857,
1706
+ "eval_loss": 0.8242577910423279,
1707
+ "eval_runtime": 0.6669,
1708
+ "eval_samples_per_second": 104.971,
1709
+ "eval_steps_per_second": 2.999,
1710
+ "step": 334
1711
+ },
1712
+ {
1713
+ "epoch": 168.0,
1714
+ "eval_accuracy": 0.6857142857142857,
1715
+ "eval_loss": 0.8237206339836121,
1716
+ "eval_runtime": 0.8327,
1717
+ "eval_samples_per_second": 84.06,
1718
+ "eval_steps_per_second": 2.402,
1719
+ "step": 336
1720
+ },
1721
+ {
1722
+ "epoch": 169.0,
1723
+ "eval_accuracy": 0.6857142857142857,
1724
+ "eval_loss": 0.8231467604637146,
1725
+ "eval_runtime": 0.6532,
1726
+ "eval_samples_per_second": 107.163,
1727
+ "eval_steps_per_second": 3.062,
1728
+ "step": 338
1729
+ },
1730
+ {
1731
+ "epoch": 170.0,
1732
+ "learning_rate": 1.8750000000000002e-05,
1733
+ "loss": 0.8519,
1734
+ "step": 340
1735
+ },
1736
+ {
1737
+ "epoch": 170.0,
1738
+ "eval_accuracy": 0.6857142857142857,
1739
+ "eval_loss": 0.8226965665817261,
1740
+ "eval_runtime": 0.6591,
1741
+ "eval_samples_per_second": 106.199,
1742
+ "eval_steps_per_second": 3.034,
1743
+ "step": 340
1744
+ },
1745
+ {
1746
+ "epoch": 171.0,
1747
+ "eval_accuracy": 0.6857142857142857,
1748
+ "eval_loss": 0.822342038154602,
1749
+ "eval_runtime": 0.8214,
1750
+ "eval_samples_per_second": 85.216,
1751
+ "eval_steps_per_second": 2.435,
1752
+ "step": 342
1753
+ },
1754
+ {
1755
+ "epoch": 172.0,
1756
+ "eval_accuracy": 0.6857142857142857,
1757
+ "eval_loss": 0.822126030921936,
1758
+ "eval_runtime": 0.6612,
1759
+ "eval_samples_per_second": 105.861,
1760
+ "eval_steps_per_second": 3.025,
1761
+ "step": 344
1762
+ },
1763
+ {
1764
+ "epoch": 173.0,
1765
+ "eval_accuracy": 0.6857142857142857,
1766
+ "eval_loss": 0.8220161199569702,
1767
+ "eval_runtime": 0.6469,
1768
+ "eval_samples_per_second": 108.212,
1769
+ "eval_steps_per_second": 3.092,
1770
+ "step": 346
1771
+ },
1772
+ {
1773
+ "epoch": 174.0,
1774
+ "eval_accuracy": 0.6857142857142857,
1775
+ "eval_loss": 0.8218111991882324,
1776
+ "eval_runtime": 0.8067,
1777
+ "eval_samples_per_second": 86.769,
1778
+ "eval_steps_per_second": 2.479,
1779
+ "step": 348
1780
+ },
1781
+ {
1782
+ "epoch": 175.0,
1783
+ "learning_rate": 1.5625e-05,
1784
+ "loss": 0.92,
1785
+ "step": 350
1786
+ },
1787
+ {
1788
+ "epoch": 175.0,
1789
+ "eval_accuracy": 0.6857142857142857,
1790
+ "eval_loss": 0.821461021900177,
1791
+ "eval_runtime": 0.6484,
1792
+ "eval_samples_per_second": 107.962,
1793
+ "eval_steps_per_second": 3.085,
1794
+ "step": 350
1795
+ },
1796
+ {
1797
+ "epoch": 176.0,
1798
+ "eval_accuracy": 0.7,
1799
+ "eval_loss": 0.8210566639900208,
1800
+ "eval_runtime": 0.6645,
1801
+ "eval_samples_per_second": 105.342,
1802
+ "eval_steps_per_second": 3.01,
1803
+ "step": 352
1804
+ },
1805
+ {
1806
+ "epoch": 177.0,
1807
+ "eval_accuracy": 0.7,
1808
+ "eval_loss": 0.8207017183303833,
1809
+ "eval_runtime": 0.8152,
1810
+ "eval_samples_per_second": 85.873,
1811
+ "eval_steps_per_second": 2.454,
1812
+ "step": 354
1813
+ },
1814
+ {
1815
+ "epoch": 178.0,
1816
+ "eval_accuracy": 0.7,
1817
+ "eval_loss": 0.8204047679901123,
1818
+ "eval_runtime": 0.7773,
1819
+ "eval_samples_per_second": 90.05,
1820
+ "eval_steps_per_second": 2.573,
1821
+ "step": 356
1822
+ },
1823
+ {
1824
+ "epoch": 179.0,
1825
+ "eval_accuracy": 0.7,
1826
+ "eval_loss": 0.8200381398200989,
1827
+ "eval_runtime": 0.6533,
1828
+ "eval_samples_per_second": 107.15,
1829
+ "eval_steps_per_second": 3.061,
1830
+ "step": 358
1831
+ },
1832
+ {
1833
+ "epoch": 180.0,
1834
+ "learning_rate": 1.25e-05,
1835
+ "loss": 0.879,
1836
+ "step": 360
1837
+ },
1838
+ {
1839
+ "epoch": 180.0,
1840
+ "eval_accuracy": 0.7,
1841
+ "eval_loss": 0.8197112083435059,
1842
+ "eval_runtime": 0.8254,
1843
+ "eval_samples_per_second": 84.803,
1844
+ "eval_steps_per_second": 2.423,
1845
+ "step": 360
1846
+ },
1847
+ {
1848
+ "epoch": 181.0,
1849
+ "eval_accuracy": 0.7,
1850
+ "eval_loss": 0.8194140195846558,
1851
+ "eval_runtime": 0.6736,
1852
+ "eval_samples_per_second": 103.918,
1853
+ "eval_steps_per_second": 2.969,
1854
+ "step": 362
1855
+ },
1856
+ {
1857
+ "epoch": 182.0,
1858
+ "eval_accuracy": 0.6857142857142857,
1859
+ "eval_loss": 0.8190609812736511,
1860
+ "eval_runtime": 0.6501,
1861
+ "eval_samples_per_second": 107.669,
1862
+ "eval_steps_per_second": 3.076,
1863
+ "step": 364
1864
+ },
1865
+ {
1866
+ "epoch": 183.0,
1867
+ "eval_accuracy": 0.6857142857142857,
1868
+ "eval_loss": 0.8187218308448792,
1869
+ "eval_runtime": 0.7205,
1870
+ "eval_samples_per_second": 97.148,
1871
+ "eval_steps_per_second": 2.776,
1872
+ "step": 366
1873
+ },
1874
+ {
1875
+ "epoch": 184.0,
1876
+ "eval_accuracy": 0.7,
1877
+ "eval_loss": 0.8184635639190674,
1878
+ "eval_runtime": 0.656,
1879
+ "eval_samples_per_second": 106.712,
1880
+ "eval_steps_per_second": 3.049,
1881
+ "step": 368
1882
+ },
1883
+ {
1884
+ "epoch": 185.0,
1885
+ "learning_rate": 9.375000000000001e-06,
1886
+ "loss": 0.8893,
1887
+ "step": 370
1888
+ },
1889
+ {
1890
+ "epoch": 185.0,
1891
+ "eval_accuracy": 0.7,
1892
+ "eval_loss": 0.8182028532028198,
1893
+ "eval_runtime": 0.6563,
1894
+ "eval_samples_per_second": 106.666,
1895
+ "eval_steps_per_second": 3.048,
1896
+ "step": 370
1897
+ },
1898
+ {
1899
+ "epoch": 186.0,
1900
+ "eval_accuracy": 0.7,
1901
+ "eval_loss": 0.8179557919502258,
1902
+ "eval_runtime": 0.6961,
1903
+ "eval_samples_per_second": 100.563,
1904
+ "eval_steps_per_second": 2.873,
1905
+ "step": 372
1906
+ },
1907
+ {
1908
+ "epoch": 187.0,
1909
+ "eval_accuracy": 0.7,
1910
+ "eval_loss": 0.8177469372749329,
1911
+ "eval_runtime": 0.6584,
1912
+ "eval_samples_per_second": 106.311,
1913
+ "eval_steps_per_second": 3.037,
1914
+ "step": 374
1915
+ },
1916
+ {
1917
+ "epoch": 188.0,
1918
+ "eval_accuracy": 0.7,
1919
+ "eval_loss": 0.8175888657569885,
1920
+ "eval_runtime": 0.6728,
1921
+ "eval_samples_per_second": 104.046,
1922
+ "eval_steps_per_second": 2.973,
1923
+ "step": 376
1924
+ },
1925
+ {
1926
+ "epoch": 189.0,
1927
+ "eval_accuracy": 0.7,
1928
+ "eval_loss": 0.8174628615379333,
1929
+ "eval_runtime": 0.661,
1930
+ "eval_samples_per_second": 105.894,
1931
+ "eval_steps_per_second": 3.026,
1932
+ "step": 378
1933
+ },
1934
+ {
1935
+ "epoch": 190.0,
1936
+ "learning_rate": 6.25e-06,
1937
+ "loss": 0.8501,
1938
+ "step": 380
1939
+ },
1940
+ {
1941
+ "epoch": 190.0,
1942
+ "eval_accuracy": 0.7,
1943
+ "eval_loss": 0.8172903656959534,
1944
+ "eval_runtime": 0.6643,
1945
+ "eval_samples_per_second": 105.379,
1946
+ "eval_steps_per_second": 3.011,
1947
+ "step": 380
1948
+ },
1949
+ {
1950
+ "epoch": 191.0,
1951
+ "eval_accuracy": 0.7,
1952
+ "eval_loss": 0.8171139359474182,
1953
+ "eval_runtime": 0.7224,
1954
+ "eval_samples_per_second": 96.898,
1955
+ "eval_steps_per_second": 2.769,
1956
+ "step": 382
1957
+ },
1958
+ {
1959
+ "epoch": 192.0,
1960
+ "eval_accuracy": 0.7,
1961
+ "eval_loss": 0.8169858455657959,
1962
+ "eval_runtime": 0.6822,
1963
+ "eval_samples_per_second": 102.605,
1964
+ "eval_steps_per_second": 2.932,
1965
+ "step": 384
1966
+ },
1967
+ {
1968
+ "epoch": 193.0,
1969
+ "eval_accuracy": 0.7,
1970
+ "eval_loss": 0.8169211149215698,
1971
+ "eval_runtime": 0.6488,
1972
+ "eval_samples_per_second": 107.887,
1973
+ "eval_steps_per_second": 3.082,
1974
+ "step": 386
1975
+ },
1976
+ {
1977
+ "epoch": 194.0,
1978
+ "eval_accuracy": 0.7,
1979
+ "eval_loss": 0.8168790340423584,
1980
+ "eval_runtime": 0.8355,
1981
+ "eval_samples_per_second": 83.778,
1982
+ "eval_steps_per_second": 2.394,
1983
+ "step": 388
1984
+ },
1985
+ {
1986
+ "epoch": 195.0,
1987
+ "learning_rate": 3.125e-06,
1988
+ "loss": 0.8611,
1989
+ "step": 390
1990
+ },
1991
+ {
1992
+ "epoch": 195.0,
1993
+ "eval_accuracy": 0.7,
1994
+ "eval_loss": 0.8168440461158752,
1995
+ "eval_runtime": 0.6488,
1996
+ "eval_samples_per_second": 107.884,
1997
+ "eval_steps_per_second": 3.082,
1998
+ "step": 390
1999
+ },
2000
+ {
2001
+ "epoch": 196.0,
2002
+ "eval_accuracy": 0.7,
2003
+ "eval_loss": 0.8168230056762695,
2004
+ "eval_runtime": 0.6602,
2005
+ "eval_samples_per_second": 106.026,
2006
+ "eval_steps_per_second": 3.029,
2007
+ "step": 392
2008
+ },
2009
+ {
2010
+ "epoch": 197.0,
2011
+ "eval_accuracy": 0.7,
2012
+ "eval_loss": 0.8167951107025146,
2013
+ "eval_runtime": 0.8588,
2014
+ "eval_samples_per_second": 81.511,
2015
+ "eval_steps_per_second": 2.329,
2016
+ "step": 394
2017
+ },
2018
+ {
2019
+ "epoch": 198.0,
2020
+ "eval_accuracy": 0.7,
2021
+ "eval_loss": 0.8167835474014282,
2022
+ "eval_runtime": 0.6762,
2023
+ "eval_samples_per_second": 103.513,
2024
+ "eval_steps_per_second": 2.958,
2025
+ "step": 396
2026
+ },
2027
+ {
2028
+ "epoch": 199.0,
2029
+ "eval_accuracy": 0.7,
2030
+ "eval_loss": 0.8167732954025269,
2031
+ "eval_runtime": 0.6596,
2032
+ "eval_samples_per_second": 106.128,
2033
+ "eval_steps_per_second": 3.032,
2034
+ "step": 398
2035
+ },
2036
+ {
2037
+ "epoch": 200.0,
2038
+ "learning_rate": 0.0,
2039
+ "loss": 0.8881,
2040
+ "step": 400
2041
+ },
2042
+ {
2043
+ "epoch": 200.0,
2044
+ "eval_accuracy": 0.7,
2045
+ "eval_loss": 0.8167622089385986,
2046
+ "eval_runtime": 0.844,
2047
+ "eval_samples_per_second": 82.939,
2048
+ "eval_steps_per_second": 2.37,
2049
+ "step": 400
2050
+ },
2051
+ {
2052
+ "epoch": 200.0,
2053
+ "step": 400,
2054
+ "total_flos": 2.2371640252416e+18,
2055
+ "train_loss": 0.9259392237663269,
2056
+ "train_runtime": 1042.9233,
2057
+ "train_samples_per_second": 86.296,
2058
+ "train_steps_per_second": 0.384
2059
+ }
2060
+ ],
2061
+ "logging_steps": 10,
2062
+ "max_steps": 400,
2063
+ "num_train_epochs": 200,
2064
+ "save_steps": 500,
2065
+ "total_flos": 2.2371640252416e+18,
2066
+ "trial_name": null,
2067
+ "trial_params": null
2068
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4c4774af85a46fd62dc09e647d1b55a45135826a5db57bb78e91c0c297860e9
3
+ size 4091