vishalkatheriya18 commited on
Commit
b813155
1 Parent(s): c5a24b6

End of training

Browse files
README.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: facebook/convnextv2-tiny-1k-224
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - imagefolder
8
+ metrics:
9
+ - accuracy
10
+ model-index:
11
+ - name: convnextv2-tiny-1k-224-finetuned-sleeve-length
12
+ results:
13
+ - task:
14
+ name: Image Classification
15
+ type: image-classification
16
+ dataset:
17
+ name: imagefolder
18
+ type: imagefolder
19
+ config: default
20
+ split: train
21
+ args: default
22
+ metrics:
23
+ - name: Accuracy
24
+ type: accuracy
25
+ value: 0.8620689655172413
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ # convnextv2-tiny-1k-224-finetuned-sleeve-length
32
+
33
+ This model is a fine-tuned version of [facebook/convnextv2-tiny-1k-224](https://huggingface.co/facebook/convnextv2-tiny-1k-224) on the imagefolder dataset.
34
+ It achieves the following results on the evaluation set:
35
+ - Loss: 0.5496
36
+ - Accuracy: 0.8621
37
+
38
+ ## Model description
39
+
40
+ More information needed
41
+
42
+ ## Intended uses & limitations
43
+
44
+ More information needed
45
+
46
+ ## Training and evaluation data
47
+
48
+ More information needed
49
+
50
+ ## Training procedure
51
+
52
+ ### Training hyperparameters
53
+
54
+ The following hyperparameters were used during training:
55
+ - learning_rate: 5e-05
56
+ - train_batch_size: 32
57
+ - eval_batch_size: 32
58
+ - seed: 42
59
+ - gradient_accumulation_steps: 4
60
+ - total_train_batch_size: 128
61
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
+ - lr_scheduler_type: linear
63
+ - lr_scheduler_warmup_ratio: 0.1
64
+ - num_epochs: 80
65
+
66
+ ### Training results
67
+
68
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
69
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|
70
+ | No log | 0.96 | 6 | 1.7957 | 0.2299 |
71
+ | 1.8656 | 1.92 | 12 | 1.7704 | 0.2759 |
72
+ | 1.8656 | 2.88 | 18 | 1.7382 | 0.3218 |
73
+ | 1.7835 | 4.0 | 25 | 1.6674 | 0.3793 |
74
+ | 1.664 | 4.96 | 31 | 1.5982 | 0.4253 |
75
+ | 1.664 | 5.92 | 37 | 1.4861 | 0.4368 |
76
+ | 1.5072 | 6.88 | 43 | 1.3645 | 0.4713 |
77
+ | 1.3304 | 8.0 | 50 | 1.2859 | 0.4598 |
78
+ | 1.3304 | 8.96 | 56 | 1.2796 | 0.4713 |
79
+ | 1.1651 | 9.92 | 62 | 1.2456 | 0.5172 |
80
+ | 1.1651 | 10.88 | 68 | 1.1667 | 0.5402 |
81
+ | 1.0876 | 12.0 | 75 | 1.1510 | 0.5632 |
82
+ | 1.0046 | 12.96 | 81 | 1.0510 | 0.6092 |
83
+ | 1.0046 | 13.92 | 87 | 1.0338 | 0.5862 |
84
+ | 0.9465 | 14.88 | 93 | 0.9883 | 0.5862 |
85
+ | 0.8699 | 16.0 | 100 | 0.9882 | 0.5632 |
86
+ | 0.8699 | 16.96 | 106 | 0.9276 | 0.5747 |
87
+ | 0.7969 | 17.92 | 112 | 0.9145 | 0.5862 |
88
+ | 0.7969 | 18.88 | 118 | 0.8144 | 0.6667 |
89
+ | 0.7254 | 20.0 | 125 | 0.7587 | 0.6667 |
90
+ | 0.6447 | 20.96 | 131 | 0.6990 | 0.7471 |
91
+ | 0.6447 | 21.92 | 137 | 0.7042 | 0.7241 |
92
+ | 0.6021 | 22.88 | 143 | 0.6526 | 0.7701 |
93
+ | 0.516 | 24.0 | 150 | 0.6485 | 0.8046 |
94
+ | 0.516 | 24.96 | 156 | 0.5803 | 0.8161 |
95
+ | 0.4497 | 25.92 | 162 | 0.6085 | 0.8046 |
96
+ | 0.4497 | 26.88 | 168 | 0.6095 | 0.8046 |
97
+ | 0.3935 | 28.0 | 175 | 0.5372 | 0.8276 |
98
+ | 0.3321 | 28.96 | 181 | 0.5829 | 0.8161 |
99
+ | 0.3321 | 29.92 | 187 | 0.6205 | 0.8161 |
100
+ | 0.3007 | 30.88 | 193 | 0.5150 | 0.8276 |
101
+ | 0.2618 | 32.0 | 200 | 0.6069 | 0.8391 |
102
+ | 0.2618 | 32.96 | 206 | 0.5273 | 0.8391 |
103
+ | 0.2411 | 33.92 | 212 | 0.4727 | 0.8621 |
104
+ | 0.2411 | 34.88 | 218 | 0.4611 | 0.8736 |
105
+ | 0.2108 | 36.0 | 225 | 0.5696 | 0.8506 |
106
+ | 0.2143 | 36.96 | 231 | 0.4944 | 0.8621 |
107
+ | 0.2143 | 37.92 | 237 | 0.5628 | 0.8161 |
108
+ | 0.1663 | 38.88 | 243 | 0.6131 | 0.8046 |
109
+ | 0.1714 | 40.0 | 250 | 0.4962 | 0.8506 |
110
+ | 0.1714 | 40.96 | 256 | 0.5023 | 0.8391 |
111
+ | 0.174 | 41.92 | 262 | 0.4842 | 0.8276 |
112
+ | 0.174 | 42.88 | 268 | 0.4679 | 0.8276 |
113
+ | 0.138 | 44.0 | 275 | 0.6271 | 0.8161 |
114
+ | 0.1437 | 44.96 | 281 | 0.5326 | 0.8506 |
115
+ | 0.1437 | 45.92 | 287 | 0.5655 | 0.8161 |
116
+ | 0.136 | 46.88 | 293 | 0.4672 | 0.8391 |
117
+ | 0.1401 | 48.0 | 300 | 0.4990 | 0.8621 |
118
+ | 0.1401 | 48.96 | 306 | 0.5445 | 0.8276 |
119
+ | 0.1281 | 49.92 | 312 | 0.4761 | 0.8736 |
120
+ | 0.1281 | 50.88 | 318 | 0.5665 | 0.8506 |
121
+ | 0.1156 | 52.0 | 325 | 0.5090 | 0.8506 |
122
+ | 0.0981 | 52.96 | 331 | 0.5152 | 0.8506 |
123
+ | 0.0981 | 53.92 | 337 | 0.5466 | 0.8161 |
124
+ | 0.1055 | 54.88 | 343 | 0.5390 | 0.8276 |
125
+ | 0.112 | 56.0 | 350 | 0.5574 | 0.8506 |
126
+ | 0.112 | 56.96 | 356 | 0.5449 | 0.8506 |
127
+ | 0.0855 | 57.92 | 362 | 0.5390 | 0.8506 |
128
+ | 0.0855 | 58.88 | 368 | 0.5206 | 0.8506 |
129
+ | 0.0899 | 60.0 | 375 | 0.5476 | 0.8621 |
130
+ | 0.1026 | 60.96 | 381 | 0.5344 | 0.8506 |
131
+ | 0.1026 | 61.92 | 387 | 0.5531 | 0.8391 |
132
+ | 0.0799 | 62.88 | 393 | 0.5723 | 0.8276 |
133
+ | 0.0844 | 64.0 | 400 | 0.5340 | 0.8161 |
134
+ | 0.0844 | 64.96 | 406 | 0.5236 | 0.8736 |
135
+ | 0.0724 | 65.92 | 412 | 0.6137 | 0.8391 |
136
+ | 0.0724 | 66.88 | 418 | 0.5825 | 0.8276 |
137
+ | 0.0867 | 68.0 | 425 | 0.5105 | 0.8621 |
138
+ | 0.071 | 68.96 | 431 | 0.5272 | 0.8506 |
139
+ | 0.071 | 69.92 | 437 | 0.5524 | 0.8506 |
140
+ | 0.0723 | 70.88 | 443 | 0.5508 | 0.8391 |
141
+ | 0.0748 | 72.0 | 450 | 0.5689 | 0.8161 |
142
+ | 0.0748 | 72.96 | 456 | 0.5556 | 0.8506 |
143
+ | 0.0589 | 73.92 | 462 | 0.5452 | 0.8506 |
144
+ | 0.0589 | 74.88 | 468 | 0.5475 | 0.8621 |
145
+ | 0.0719 | 76.0 | 475 | 0.5484 | 0.8621 |
146
+ | 0.0801 | 76.8 | 480 | 0.5496 | 0.8621 |
147
+
148
+
149
+ ### Framework versions
150
+
151
+ - Transformers 4.44.0
152
+ - Pytorch 2.4.0
153
+ - Datasets 2.21.0
154
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 76.8,
3
+ "eval_accuracy": 0.8620689655172413,
4
+ "eval_loss": 0.5496163368225098,
5
+ "eval_runtime": 2.0578,
6
+ "eval_samples_per_second": 42.279,
7
+ "eval_steps_per_second": 1.458,
8
+ "total_flos": 1.514063180200919e+18,
9
+ "train_loss": 0.449434948215882,
10
+ "train_runtime": 1985.4684,
11
+ "train_samples_per_second": 31.549,
12
+ "train_steps_per_second": 0.242
13
+ }
config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/convnextv2-tiny-1k-224",
3
+ "architectures": [
4
+ "ConvNextV2ForImageClassification"
5
+ ],
6
+ "depths": [
7
+ 3,
8
+ 3,
9
+ 9,
10
+ 3
11
+ ],
12
+ "drop_path_rate": 0.0,
13
+ "hidden_act": "gelu",
14
+ "hidden_sizes": [
15
+ 96,
16
+ 192,
17
+ 384,
18
+ 768
19
+ ],
20
+ "id2label": {
21
+ "0": "bell sleeve",
22
+ "1": "half sleeve",
23
+ "2": "long sleeve",
24
+ "3": "short sleeve",
25
+ "4": "sleeveless",
26
+ "5": "three-quarter sleeves"
27
+ },
28
+ "image_size": 224,
29
+ "initializer_range": 0.02,
30
+ "label2id": {
31
+ "bell sleeve": 0,
32
+ "half sleeve": 1,
33
+ "long sleeve": 2,
34
+ "short sleeve": 3,
35
+ "sleeveless": 4,
36
+ "three-quarter sleeves": 5
37
+ },
38
+ "layer_norm_eps": 1e-12,
39
+ "model_type": "convnextv2",
40
+ "num_channels": 3,
41
+ "num_stages": 4,
42
+ "out_features": [
43
+ "stage4"
44
+ ],
45
+ "out_indices": [
46
+ 4
47
+ ],
48
+ "patch_size": 4,
49
+ "problem_type": "single_label_classification",
50
+ "stage_names": [
51
+ "stem",
52
+ "stage1",
53
+ "stage2",
54
+ "stage3",
55
+ "stage4"
56
+ ],
57
+ "torch_dtype": "float32",
58
+ "transformers_version": "4.44.0"
59
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 76.8,
3
+ "eval_accuracy": 0.8620689655172413,
4
+ "eval_loss": 0.5496163368225098,
5
+ "eval_runtime": 2.0578,
6
+ "eval_samples_per_second": 42.279,
7
+ "eval_steps_per_second": 1.458
8
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e280ffcf375e022f942b34fd32ddf23b0251721c089464e933379b60366f542
3
+ size 111508128
preprocessor_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_pct": 0.875,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.485,
8
+ 0.456,
9
+ 0.406
10
+ ],
11
+ "image_processor_type": "ConvNextImageProcessor",
12
+ "image_std": [
13
+ 0.229,
14
+ 0.224,
15
+ 0.225
16
+ ],
17
+ "resample": 3,
18
+ "rescale_factor": 0.00392156862745098,
19
+ "size": {
20
+ "shortest_edge": 224
21
+ }
22
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 76.8,
3
+ "total_flos": 1.514063180200919e+18,
4
+ "train_loss": 0.449434948215882,
5
+ "train_runtime": 1985.4684,
6
+ "train_samples_per_second": 31.549,
7
+ "train_steps_per_second": 0.242
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1071 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 76.8,
5
+ "eval_steps": 500,
6
+ "global_step": 480,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.96,
13
+ "eval_accuracy": 0.22988505747126436,
14
+ "eval_loss": 1.7957335710525513,
15
+ "eval_runtime": 2.3499,
16
+ "eval_samples_per_second": 37.022,
17
+ "eval_steps_per_second": 1.277,
18
+ "step": 6
19
+ },
20
+ {
21
+ "epoch": 1.6,
22
+ "grad_norm": 5.402665615081787,
23
+ "learning_rate": 1.0416666666666668e-05,
24
+ "loss": 1.8656,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 1.92,
29
+ "eval_accuracy": 0.27586206896551724,
30
+ "eval_loss": 1.7703895568847656,
31
+ "eval_runtime": 2.2621,
32
+ "eval_samples_per_second": 38.46,
33
+ "eval_steps_per_second": 1.326,
34
+ "step": 12
35
+ },
36
+ {
37
+ "epoch": 2.88,
38
+ "eval_accuracy": 0.3218390804597701,
39
+ "eval_loss": 1.738166332244873,
40
+ "eval_runtime": 2.047,
41
+ "eval_samples_per_second": 42.502,
42
+ "eval_steps_per_second": 1.466,
43
+ "step": 18
44
+ },
45
+ {
46
+ "epoch": 3.2,
47
+ "grad_norm": 9.074636459350586,
48
+ "learning_rate": 2.0833333333333336e-05,
49
+ "loss": 1.7835,
50
+ "step": 20
51
+ },
52
+ {
53
+ "epoch": 4.0,
54
+ "eval_accuracy": 0.3793103448275862,
55
+ "eval_loss": 1.6673917770385742,
56
+ "eval_runtime": 2.0721,
57
+ "eval_samples_per_second": 41.987,
58
+ "eval_steps_per_second": 1.448,
59
+ "step": 25
60
+ },
61
+ {
62
+ "epoch": 4.8,
63
+ "grad_norm": 23.296255111694336,
64
+ "learning_rate": 3.125e-05,
65
+ "loss": 1.664,
66
+ "step": 30
67
+ },
68
+ {
69
+ "epoch": 4.96,
70
+ "eval_accuracy": 0.42528735632183906,
71
+ "eval_loss": 1.5981522798538208,
72
+ "eval_runtime": 2.0749,
73
+ "eval_samples_per_second": 41.93,
74
+ "eval_steps_per_second": 1.446,
75
+ "step": 31
76
+ },
77
+ {
78
+ "epoch": 5.92,
79
+ "eval_accuracy": 0.4367816091954023,
80
+ "eval_loss": 1.4861106872558594,
81
+ "eval_runtime": 2.0842,
82
+ "eval_samples_per_second": 41.743,
83
+ "eval_steps_per_second": 1.439,
84
+ "step": 37
85
+ },
86
+ {
87
+ "epoch": 6.4,
88
+ "grad_norm": 36.56019592285156,
89
+ "learning_rate": 4.166666666666667e-05,
90
+ "loss": 1.5072,
91
+ "step": 40
92
+ },
93
+ {
94
+ "epoch": 6.88,
95
+ "eval_accuracy": 0.47126436781609193,
96
+ "eval_loss": 1.3644713163375854,
97
+ "eval_runtime": 2.0344,
98
+ "eval_samples_per_second": 42.765,
99
+ "eval_steps_per_second": 1.475,
100
+ "step": 43
101
+ },
102
+ {
103
+ "epoch": 8.0,
104
+ "grad_norm": 70.99100494384766,
105
+ "learning_rate": 4.976851851851852e-05,
106
+ "loss": 1.3304,
107
+ "step": 50
108
+ },
109
+ {
110
+ "epoch": 8.0,
111
+ "eval_accuracy": 0.45977011494252873,
112
+ "eval_loss": 1.285918116569519,
113
+ "eval_runtime": 2.0671,
114
+ "eval_samples_per_second": 42.088,
115
+ "eval_steps_per_second": 1.451,
116
+ "step": 50
117
+ },
118
+ {
119
+ "epoch": 8.96,
120
+ "eval_accuracy": 0.47126436781609193,
121
+ "eval_loss": 1.2795610427856445,
122
+ "eval_runtime": 2.0462,
123
+ "eval_samples_per_second": 42.519,
124
+ "eval_steps_per_second": 1.466,
125
+ "step": 56
126
+ },
127
+ {
128
+ "epoch": 9.6,
129
+ "grad_norm": 60.155181884765625,
130
+ "learning_rate": 4.8611111111111115e-05,
131
+ "loss": 1.1651,
132
+ "step": 60
133
+ },
134
+ {
135
+ "epoch": 9.92,
136
+ "eval_accuracy": 0.5172413793103449,
137
+ "eval_loss": 1.2455964088439941,
138
+ "eval_runtime": 2.0479,
139
+ "eval_samples_per_second": 42.483,
140
+ "eval_steps_per_second": 1.465,
141
+ "step": 62
142
+ },
143
+ {
144
+ "epoch": 10.88,
145
+ "eval_accuracy": 0.5402298850574713,
146
+ "eval_loss": 1.1666686534881592,
147
+ "eval_runtime": 2.0486,
148
+ "eval_samples_per_second": 42.468,
149
+ "eval_steps_per_second": 1.464,
150
+ "step": 68
151
+ },
152
+ {
153
+ "epoch": 11.2,
154
+ "grad_norm": 17.20172119140625,
155
+ "learning_rate": 4.745370370370371e-05,
156
+ "loss": 1.0876,
157
+ "step": 70
158
+ },
159
+ {
160
+ "epoch": 12.0,
161
+ "eval_accuracy": 0.5632183908045977,
162
+ "eval_loss": 1.1510032415390015,
163
+ "eval_runtime": 2.0486,
164
+ "eval_samples_per_second": 42.468,
165
+ "eval_steps_per_second": 1.464,
166
+ "step": 75
167
+ },
168
+ {
169
+ "epoch": 12.8,
170
+ "grad_norm": 98.55331420898438,
171
+ "learning_rate": 4.62962962962963e-05,
172
+ "loss": 1.0046,
173
+ "step": 80
174
+ },
175
+ {
176
+ "epoch": 12.96,
177
+ "eval_accuracy": 0.6091954022988506,
178
+ "eval_loss": 1.0509852170944214,
179
+ "eval_runtime": 2.2,
180
+ "eval_samples_per_second": 39.546,
181
+ "eval_steps_per_second": 1.364,
182
+ "step": 81
183
+ },
184
+ {
185
+ "epoch": 13.92,
186
+ "eval_accuracy": 0.5862068965517241,
187
+ "eval_loss": 1.033838152885437,
188
+ "eval_runtime": 2.0133,
189
+ "eval_samples_per_second": 43.212,
190
+ "eval_steps_per_second": 1.49,
191
+ "step": 87
192
+ },
193
+ {
194
+ "epoch": 14.4,
195
+ "grad_norm": 53.443302154541016,
196
+ "learning_rate": 4.5138888888888894e-05,
197
+ "loss": 0.9465,
198
+ "step": 90
199
+ },
200
+ {
201
+ "epoch": 14.88,
202
+ "eval_accuracy": 0.5862068965517241,
203
+ "eval_loss": 0.9883113503456116,
204
+ "eval_runtime": 2.05,
205
+ "eval_samples_per_second": 42.439,
206
+ "eval_steps_per_second": 1.463,
207
+ "step": 93
208
+ },
209
+ {
210
+ "epoch": 16.0,
211
+ "grad_norm": 30.475088119506836,
212
+ "learning_rate": 4.3981481481481486e-05,
213
+ "loss": 0.8699,
214
+ "step": 100
215
+ },
216
+ {
217
+ "epoch": 16.0,
218
+ "eval_accuracy": 0.5632183908045977,
219
+ "eval_loss": 0.9881502389907837,
220
+ "eval_runtime": 2.0881,
221
+ "eval_samples_per_second": 41.664,
222
+ "eval_steps_per_second": 1.437,
223
+ "step": 100
224
+ },
225
+ {
226
+ "epoch": 16.96,
227
+ "eval_accuracy": 0.5747126436781609,
228
+ "eval_loss": 0.9276102781295776,
229
+ "eval_runtime": 2.0889,
230
+ "eval_samples_per_second": 41.648,
231
+ "eval_steps_per_second": 1.436,
232
+ "step": 106
233
+ },
234
+ {
235
+ "epoch": 17.6,
236
+ "grad_norm": 21.802074432373047,
237
+ "learning_rate": 4.282407407407408e-05,
238
+ "loss": 0.7969,
239
+ "step": 110
240
+ },
241
+ {
242
+ "epoch": 17.92,
243
+ "eval_accuracy": 0.5862068965517241,
244
+ "eval_loss": 0.9144545197486877,
245
+ "eval_runtime": 2.0314,
246
+ "eval_samples_per_second": 42.828,
247
+ "eval_steps_per_second": 1.477,
248
+ "step": 112
249
+ },
250
+ {
251
+ "epoch": 18.88,
252
+ "eval_accuracy": 0.6666666666666666,
253
+ "eval_loss": 0.8143898844718933,
254
+ "eval_runtime": 2.0134,
255
+ "eval_samples_per_second": 43.21,
256
+ "eval_steps_per_second": 1.49,
257
+ "step": 118
258
+ },
259
+ {
260
+ "epoch": 19.2,
261
+ "grad_norm": 58.785552978515625,
262
+ "learning_rate": 4.166666666666667e-05,
263
+ "loss": 0.7254,
264
+ "step": 120
265
+ },
266
+ {
267
+ "epoch": 20.0,
268
+ "eval_accuracy": 0.6666666666666666,
269
+ "eval_loss": 0.7586901187896729,
270
+ "eval_runtime": 2.0526,
271
+ "eval_samples_per_second": 42.386,
272
+ "eval_steps_per_second": 1.462,
273
+ "step": 125
274
+ },
275
+ {
276
+ "epoch": 20.8,
277
+ "grad_norm": 24.079566955566406,
278
+ "learning_rate": 4.0509259259259265e-05,
279
+ "loss": 0.6447,
280
+ "step": 130
281
+ },
282
+ {
283
+ "epoch": 20.96,
284
+ "eval_accuracy": 0.7471264367816092,
285
+ "eval_loss": 0.6990374326705933,
286
+ "eval_runtime": 2.0625,
287
+ "eval_samples_per_second": 42.182,
288
+ "eval_steps_per_second": 1.455,
289
+ "step": 131
290
+ },
291
+ {
292
+ "epoch": 21.92,
293
+ "eval_accuracy": 0.7241379310344828,
294
+ "eval_loss": 0.7041503190994263,
295
+ "eval_runtime": 2.0267,
296
+ "eval_samples_per_second": 42.926,
297
+ "eval_steps_per_second": 1.48,
298
+ "step": 137
299
+ },
300
+ {
301
+ "epoch": 22.4,
302
+ "grad_norm": 24.671459197998047,
303
+ "learning_rate": 3.935185185185186e-05,
304
+ "loss": 0.6021,
305
+ "step": 140
306
+ },
307
+ {
308
+ "epoch": 22.88,
309
+ "eval_accuracy": 0.7701149425287356,
310
+ "eval_loss": 0.6526182293891907,
311
+ "eval_runtime": 2.1122,
312
+ "eval_samples_per_second": 41.189,
313
+ "eval_steps_per_second": 1.42,
314
+ "step": 143
315
+ },
316
+ {
317
+ "epoch": 24.0,
318
+ "grad_norm": 55.466121673583984,
319
+ "learning_rate": 3.8194444444444444e-05,
320
+ "loss": 0.516,
321
+ "step": 150
322
+ },
323
+ {
324
+ "epoch": 24.0,
325
+ "eval_accuracy": 0.8045977011494253,
326
+ "eval_loss": 0.6485260128974915,
327
+ "eval_runtime": 2.0692,
328
+ "eval_samples_per_second": 42.046,
329
+ "eval_steps_per_second": 1.45,
330
+ "step": 150
331
+ },
332
+ {
333
+ "epoch": 24.96,
334
+ "eval_accuracy": 0.8160919540229885,
335
+ "eval_loss": 0.5802629590034485,
336
+ "eval_runtime": 2.0421,
337
+ "eval_samples_per_second": 42.603,
338
+ "eval_steps_per_second": 1.469,
339
+ "step": 156
340
+ },
341
+ {
342
+ "epoch": 25.6,
343
+ "grad_norm": 17.66895294189453,
344
+ "learning_rate": 3.7037037037037037e-05,
345
+ "loss": 0.4497,
346
+ "step": 160
347
+ },
348
+ {
349
+ "epoch": 25.92,
350
+ "eval_accuracy": 0.8045977011494253,
351
+ "eval_loss": 0.6084781289100647,
352
+ "eval_runtime": 2.0191,
353
+ "eval_samples_per_second": 43.088,
354
+ "eval_steps_per_second": 1.486,
355
+ "step": 162
356
+ },
357
+ {
358
+ "epoch": 26.88,
359
+ "eval_accuracy": 0.8045977011494253,
360
+ "eval_loss": 0.6094852685928345,
361
+ "eval_runtime": 1.9897,
362
+ "eval_samples_per_second": 43.724,
363
+ "eval_steps_per_second": 1.508,
364
+ "step": 168
365
+ },
366
+ {
367
+ "epoch": 27.2,
368
+ "grad_norm": 31.39649200439453,
369
+ "learning_rate": 3.587962962962963e-05,
370
+ "loss": 0.3935,
371
+ "step": 170
372
+ },
373
+ {
374
+ "epoch": 28.0,
375
+ "eval_accuracy": 0.8275862068965517,
376
+ "eval_loss": 0.5372287034988403,
377
+ "eval_runtime": 2.0637,
378
+ "eval_samples_per_second": 42.158,
379
+ "eval_steps_per_second": 1.454,
380
+ "step": 175
381
+ },
382
+ {
383
+ "epoch": 28.8,
384
+ "grad_norm": 31.86827278137207,
385
+ "learning_rate": 3.472222222222222e-05,
386
+ "loss": 0.3321,
387
+ "step": 180
388
+ },
389
+ {
390
+ "epoch": 28.96,
391
+ "eval_accuracy": 0.8160919540229885,
392
+ "eval_loss": 0.5828755497932434,
393
+ "eval_runtime": 2.1428,
394
+ "eval_samples_per_second": 40.6,
395
+ "eval_steps_per_second": 1.4,
396
+ "step": 181
397
+ },
398
+ {
399
+ "epoch": 29.92,
400
+ "eval_accuracy": 0.8160919540229885,
401
+ "eval_loss": 0.6204901337623596,
402
+ "eval_runtime": 2.0154,
403
+ "eval_samples_per_second": 43.168,
404
+ "eval_steps_per_second": 1.489,
405
+ "step": 187
406
+ },
407
+ {
408
+ "epoch": 30.4,
409
+ "grad_norm": 42.88612747192383,
410
+ "learning_rate": 3.3564814814814815e-05,
411
+ "loss": 0.3007,
412
+ "step": 190
413
+ },
414
+ {
415
+ "epoch": 30.88,
416
+ "eval_accuracy": 0.8275862068965517,
417
+ "eval_loss": 0.5149825811386108,
418
+ "eval_runtime": 2.0492,
419
+ "eval_samples_per_second": 42.456,
420
+ "eval_steps_per_second": 1.464,
421
+ "step": 193
422
+ },
423
+ {
424
+ "epoch": 32.0,
425
+ "grad_norm": 30.13237190246582,
426
+ "learning_rate": 3.240740740740741e-05,
427
+ "loss": 0.2618,
428
+ "step": 200
429
+ },
430
+ {
431
+ "epoch": 32.0,
432
+ "eval_accuracy": 0.8390804597701149,
433
+ "eval_loss": 0.6068965196609497,
434
+ "eval_runtime": 2.0657,
435
+ "eval_samples_per_second": 42.117,
436
+ "eval_steps_per_second": 1.452,
437
+ "step": 200
438
+ },
439
+ {
440
+ "epoch": 32.96,
441
+ "eval_accuracy": 0.8390804597701149,
442
+ "eval_loss": 0.5272508859634399,
443
+ "eval_runtime": 2.0395,
444
+ "eval_samples_per_second": 42.657,
445
+ "eval_steps_per_second": 1.471,
446
+ "step": 206
447
+ },
448
+ {
449
+ "epoch": 33.6,
450
+ "grad_norm": 24.97075080871582,
451
+ "learning_rate": 3.125e-05,
452
+ "loss": 0.2411,
453
+ "step": 210
454
+ },
455
+ {
456
+ "epoch": 33.92,
457
+ "eval_accuracy": 0.8620689655172413,
458
+ "eval_loss": 0.4726714789867401,
459
+ "eval_runtime": 2.0758,
460
+ "eval_samples_per_second": 41.912,
461
+ "eval_steps_per_second": 1.445,
462
+ "step": 212
463
+ },
464
+ {
465
+ "epoch": 34.88,
466
+ "eval_accuracy": 0.8735632183908046,
467
+ "eval_loss": 0.4611084461212158,
468
+ "eval_runtime": 2.0264,
469
+ "eval_samples_per_second": 42.934,
470
+ "eval_steps_per_second": 1.48,
471
+ "step": 218
472
+ },
473
+ {
474
+ "epoch": 35.2,
475
+ "grad_norm": 60.3193359375,
476
+ "learning_rate": 3.0092592592592593e-05,
477
+ "loss": 0.2108,
478
+ "step": 220
479
+ },
480
+ {
481
+ "epoch": 36.0,
482
+ "eval_accuracy": 0.8505747126436781,
483
+ "eval_loss": 0.5696073770523071,
484
+ "eval_runtime": 2.0919,
485
+ "eval_samples_per_second": 41.589,
486
+ "eval_steps_per_second": 1.434,
487
+ "step": 225
488
+ },
489
+ {
490
+ "epoch": 36.8,
491
+ "grad_norm": 16.915546417236328,
492
+ "learning_rate": 2.8935185185185186e-05,
493
+ "loss": 0.2143,
494
+ "step": 230
495
+ },
496
+ {
497
+ "epoch": 36.96,
498
+ "eval_accuracy": 0.8620689655172413,
499
+ "eval_loss": 0.49439194798469543,
500
+ "eval_runtime": 2.0923,
501
+ "eval_samples_per_second": 41.58,
502
+ "eval_steps_per_second": 1.434,
503
+ "step": 231
504
+ },
505
+ {
506
+ "epoch": 37.92,
507
+ "eval_accuracy": 0.8160919540229885,
508
+ "eval_loss": 0.5627816915512085,
509
+ "eval_runtime": 2.0503,
510
+ "eval_samples_per_second": 42.432,
511
+ "eval_steps_per_second": 1.463,
512
+ "step": 237
513
+ },
514
+ {
515
+ "epoch": 38.4,
516
+ "grad_norm": 14.699493408203125,
517
+ "learning_rate": 2.777777777777778e-05,
518
+ "loss": 0.1663,
519
+ "step": 240
520
+ },
521
+ {
522
+ "epoch": 38.88,
523
+ "eval_accuracy": 0.8045977011494253,
524
+ "eval_loss": 0.6131365895271301,
525
+ "eval_runtime": 2.0693,
526
+ "eval_samples_per_second": 42.044,
527
+ "eval_steps_per_second": 1.45,
528
+ "step": 243
529
+ },
530
+ {
531
+ "epoch": 40.0,
532
+ "grad_norm": 25.7874755859375,
533
+ "learning_rate": 2.6620370370370372e-05,
534
+ "loss": 0.1714,
535
+ "step": 250
536
+ },
537
+ {
538
+ "epoch": 40.0,
539
+ "eval_accuracy": 0.8505747126436781,
540
+ "eval_loss": 0.4961901605129242,
541
+ "eval_runtime": 2.0252,
542
+ "eval_samples_per_second": 42.959,
543
+ "eval_steps_per_second": 1.481,
544
+ "step": 250
545
+ },
546
+ {
547
+ "epoch": 40.96,
548
+ "eval_accuracy": 0.8390804597701149,
549
+ "eval_loss": 0.5022612810134888,
550
+ "eval_runtime": 2.127,
551
+ "eval_samples_per_second": 40.904,
552
+ "eval_steps_per_second": 1.41,
553
+ "step": 256
554
+ },
555
+ {
556
+ "epoch": 41.6,
557
+ "grad_norm": 24.087005615234375,
558
+ "learning_rate": 2.5462962962962965e-05,
559
+ "loss": 0.174,
560
+ "step": 260
561
+ },
562
+ {
563
+ "epoch": 41.92,
564
+ "eval_accuracy": 0.8275862068965517,
565
+ "eval_loss": 0.48418501019477844,
566
+ "eval_runtime": 2.0168,
567
+ "eval_samples_per_second": 43.137,
568
+ "eval_steps_per_second": 1.487,
569
+ "step": 262
570
+ },
571
+ {
572
+ "epoch": 42.88,
573
+ "eval_accuracy": 0.8275862068965517,
574
+ "eval_loss": 0.46790340542793274,
575
+ "eval_runtime": 2.0909,
576
+ "eval_samples_per_second": 41.609,
577
+ "eval_steps_per_second": 1.435,
578
+ "step": 268
579
+ },
580
+ {
581
+ "epoch": 43.2,
582
+ "grad_norm": 13.284588813781738,
583
+ "learning_rate": 2.4305555555555558e-05,
584
+ "loss": 0.138,
585
+ "step": 270
586
+ },
587
+ {
588
+ "epoch": 44.0,
589
+ "eval_accuracy": 0.8160919540229885,
590
+ "eval_loss": 0.6270841956138611,
591
+ "eval_runtime": 2.1069,
592
+ "eval_samples_per_second": 41.294,
593
+ "eval_steps_per_second": 1.424,
594
+ "step": 275
595
+ },
596
+ {
597
+ "epoch": 44.8,
598
+ "grad_norm": 14.41830062866211,
599
+ "learning_rate": 2.314814814814815e-05,
600
+ "loss": 0.1437,
601
+ "step": 280
602
+ },
603
+ {
604
+ "epoch": 44.96,
605
+ "eval_accuracy": 0.8505747126436781,
606
+ "eval_loss": 0.5325595736503601,
607
+ "eval_runtime": 2.1982,
608
+ "eval_samples_per_second": 39.578,
609
+ "eval_steps_per_second": 1.365,
610
+ "step": 281
611
+ },
612
+ {
613
+ "epoch": 45.92,
614
+ "eval_accuracy": 0.8160919540229885,
615
+ "eval_loss": 0.5655315518379211,
616
+ "eval_runtime": 2.0683,
617
+ "eval_samples_per_second": 42.063,
618
+ "eval_steps_per_second": 1.45,
619
+ "step": 287
620
+ },
621
+ {
622
+ "epoch": 46.4,
623
+ "grad_norm": 17.588279724121094,
624
+ "learning_rate": 2.1990740740740743e-05,
625
+ "loss": 0.136,
626
+ "step": 290
627
+ },
628
+ {
629
+ "epoch": 46.88,
630
+ "eval_accuracy": 0.8390804597701149,
631
+ "eval_loss": 0.46718767285346985,
632
+ "eval_runtime": 2.0892,
633
+ "eval_samples_per_second": 41.643,
634
+ "eval_steps_per_second": 1.436,
635
+ "step": 293
636
+ },
637
+ {
638
+ "epoch": 48.0,
639
+ "grad_norm": 22.864524841308594,
640
+ "learning_rate": 2.0833333333333336e-05,
641
+ "loss": 0.1401,
642
+ "step": 300
643
+ },
644
+ {
645
+ "epoch": 48.0,
646
+ "eval_accuracy": 0.8620689655172413,
647
+ "eval_loss": 0.498960942029953,
648
+ "eval_runtime": 2.0484,
649
+ "eval_samples_per_second": 42.471,
650
+ "eval_steps_per_second": 1.465,
651
+ "step": 300
652
+ },
653
+ {
654
+ "epoch": 48.96,
655
+ "eval_accuracy": 0.8275862068965517,
656
+ "eval_loss": 0.5445386171340942,
657
+ "eval_runtime": 2.0365,
658
+ "eval_samples_per_second": 42.721,
659
+ "eval_steps_per_second": 1.473,
660
+ "step": 306
661
+ },
662
+ {
663
+ "epoch": 49.6,
664
+ "grad_norm": 22.651620864868164,
665
+ "learning_rate": 1.967592592592593e-05,
666
+ "loss": 0.1281,
667
+ "step": 310
668
+ },
669
+ {
670
+ "epoch": 49.92,
671
+ "eval_accuracy": 0.8735632183908046,
672
+ "eval_loss": 0.47610902786254883,
673
+ "eval_runtime": 2.1166,
674
+ "eval_samples_per_second": 41.104,
675
+ "eval_steps_per_second": 1.417,
676
+ "step": 312
677
+ },
678
+ {
679
+ "epoch": 50.88,
680
+ "eval_accuracy": 0.8505747126436781,
681
+ "eval_loss": 0.5665103793144226,
682
+ "eval_runtime": 2.1168,
683
+ "eval_samples_per_second": 41.1,
684
+ "eval_steps_per_second": 1.417,
685
+ "step": 318
686
+ },
687
+ {
688
+ "epoch": 51.2,
689
+ "grad_norm": 26.539594650268555,
690
+ "learning_rate": 1.8518518518518518e-05,
691
+ "loss": 0.1156,
692
+ "step": 320
693
+ },
694
+ {
695
+ "epoch": 52.0,
696
+ "eval_accuracy": 0.8505747126436781,
697
+ "eval_loss": 0.5089926719665527,
698
+ "eval_runtime": 2.0775,
699
+ "eval_samples_per_second": 41.877,
700
+ "eval_steps_per_second": 1.444,
701
+ "step": 325
702
+ },
703
+ {
704
+ "epoch": 52.8,
705
+ "grad_norm": 23.464221954345703,
706
+ "learning_rate": 1.736111111111111e-05,
707
+ "loss": 0.0981,
708
+ "step": 330
709
+ },
710
+ {
711
+ "epoch": 52.96,
712
+ "eval_accuracy": 0.8505747126436781,
713
+ "eval_loss": 0.5152259469032288,
714
+ "eval_runtime": 2.0607,
715
+ "eval_samples_per_second": 42.219,
716
+ "eval_steps_per_second": 1.456,
717
+ "step": 331
718
+ },
719
+ {
720
+ "epoch": 53.92,
721
+ "eval_accuracy": 0.8160919540229885,
722
+ "eval_loss": 0.5466004610061646,
723
+ "eval_runtime": 2.0591,
724
+ "eval_samples_per_second": 42.251,
725
+ "eval_steps_per_second": 1.457,
726
+ "step": 337
727
+ },
728
+ {
729
+ "epoch": 54.4,
730
+ "grad_norm": 14.581974983215332,
731
+ "learning_rate": 1.6203703703703704e-05,
732
+ "loss": 0.1055,
733
+ "step": 340
734
+ },
735
+ {
736
+ "epoch": 54.88,
737
+ "eval_accuracy": 0.8275862068965517,
738
+ "eval_loss": 0.5390048623085022,
739
+ "eval_runtime": 2.0443,
740
+ "eval_samples_per_second": 42.558,
741
+ "eval_steps_per_second": 1.468,
742
+ "step": 343
743
+ },
744
+ {
745
+ "epoch": 56.0,
746
+ "grad_norm": 14.774139404296875,
747
+ "learning_rate": 1.5046296296296297e-05,
748
+ "loss": 0.112,
749
+ "step": 350
750
+ },
751
+ {
752
+ "epoch": 56.0,
753
+ "eval_accuracy": 0.8505747126436781,
754
+ "eval_loss": 0.5574498176574707,
755
+ "eval_runtime": 2.0874,
756
+ "eval_samples_per_second": 41.679,
757
+ "eval_steps_per_second": 1.437,
758
+ "step": 350
759
+ },
760
+ {
761
+ "epoch": 56.96,
762
+ "eval_accuracy": 0.8505747126436781,
763
+ "eval_loss": 0.5448784828186035,
764
+ "eval_runtime": 2.0514,
765
+ "eval_samples_per_second": 42.41,
766
+ "eval_steps_per_second": 1.462,
767
+ "step": 356
768
+ },
769
+ {
770
+ "epoch": 57.6,
771
+ "grad_norm": 18.17756462097168,
772
+ "learning_rate": 1.388888888888889e-05,
773
+ "loss": 0.0855,
774
+ "step": 360
775
+ },
776
+ {
777
+ "epoch": 57.92,
778
+ "eval_accuracy": 0.8505747126436781,
779
+ "eval_loss": 0.5390240550041199,
780
+ "eval_runtime": 2.077,
781
+ "eval_samples_per_second": 41.888,
782
+ "eval_steps_per_second": 1.444,
783
+ "step": 362
784
+ },
785
+ {
786
+ "epoch": 58.88,
787
+ "eval_accuracy": 0.8505747126436781,
788
+ "eval_loss": 0.5206344723701477,
789
+ "eval_runtime": 2.0568,
790
+ "eval_samples_per_second": 42.299,
791
+ "eval_steps_per_second": 1.459,
792
+ "step": 368
793
+ },
794
+ {
795
+ "epoch": 59.2,
796
+ "grad_norm": 40.29678726196289,
797
+ "learning_rate": 1.2731481481481482e-05,
798
+ "loss": 0.0899,
799
+ "step": 370
800
+ },
801
+ {
802
+ "epoch": 60.0,
803
+ "eval_accuracy": 0.8620689655172413,
804
+ "eval_loss": 0.5475941300392151,
805
+ "eval_runtime": 2.063,
806
+ "eval_samples_per_second": 42.172,
807
+ "eval_steps_per_second": 1.454,
808
+ "step": 375
809
+ },
810
+ {
811
+ "epoch": 60.8,
812
+ "grad_norm": 33.910377502441406,
813
+ "learning_rate": 1.1574074074074075e-05,
814
+ "loss": 0.1026,
815
+ "step": 380
816
+ },
817
+ {
818
+ "epoch": 60.96,
819
+ "eval_accuracy": 0.8505747126436781,
820
+ "eval_loss": 0.5344437956809998,
821
+ "eval_runtime": 2.298,
822
+ "eval_samples_per_second": 37.858,
823
+ "eval_steps_per_second": 1.305,
824
+ "step": 381
825
+ },
826
+ {
827
+ "epoch": 61.92,
828
+ "eval_accuracy": 0.8390804597701149,
829
+ "eval_loss": 0.553070068359375,
830
+ "eval_runtime": 2.1032,
831
+ "eval_samples_per_second": 41.366,
832
+ "eval_steps_per_second": 1.426,
833
+ "step": 387
834
+ },
835
+ {
836
+ "epoch": 62.4,
837
+ "grad_norm": 13.71580982208252,
838
+ "learning_rate": 1.0416666666666668e-05,
839
+ "loss": 0.0799,
840
+ "step": 390
841
+ },
842
+ {
843
+ "epoch": 62.88,
844
+ "eval_accuracy": 0.8275862068965517,
845
+ "eval_loss": 0.57228684425354,
846
+ "eval_runtime": 2.0779,
847
+ "eval_samples_per_second": 41.868,
848
+ "eval_steps_per_second": 1.444,
849
+ "step": 393
850
+ },
851
+ {
852
+ "epoch": 64.0,
853
+ "grad_norm": 28.238468170166016,
854
+ "learning_rate": 9.259259259259259e-06,
855
+ "loss": 0.0844,
856
+ "step": 400
857
+ },
858
+ {
859
+ "epoch": 64.0,
860
+ "eval_accuracy": 0.8160919540229885,
861
+ "eval_loss": 0.5339850783348083,
862
+ "eval_runtime": 2.0258,
863
+ "eval_samples_per_second": 42.946,
864
+ "eval_steps_per_second": 1.481,
865
+ "step": 400
866
+ },
867
+ {
868
+ "epoch": 64.96,
869
+ "eval_accuracy": 0.8735632183908046,
870
+ "eval_loss": 0.52364581823349,
871
+ "eval_runtime": 2.0251,
872
+ "eval_samples_per_second": 42.961,
873
+ "eval_steps_per_second": 1.481,
874
+ "step": 406
875
+ },
876
+ {
877
+ "epoch": 65.6,
878
+ "grad_norm": 10.24560832977295,
879
+ "learning_rate": 8.101851851851852e-06,
880
+ "loss": 0.0724,
881
+ "step": 410
882
+ },
883
+ {
884
+ "epoch": 65.92,
885
+ "eval_accuracy": 0.8390804597701149,
886
+ "eval_loss": 0.6136645674705505,
887
+ "eval_runtime": 2.03,
888
+ "eval_samples_per_second": 42.858,
889
+ "eval_steps_per_second": 1.478,
890
+ "step": 412
891
+ },
892
+ {
893
+ "epoch": 66.88,
894
+ "eval_accuracy": 0.8275862068965517,
895
+ "eval_loss": 0.5824962854385376,
896
+ "eval_runtime": 2.0787,
897
+ "eval_samples_per_second": 41.854,
898
+ "eval_steps_per_second": 1.443,
899
+ "step": 418
900
+ },
901
+ {
902
+ "epoch": 67.2,
903
+ "grad_norm": 20.803382873535156,
904
+ "learning_rate": 6.944444444444445e-06,
905
+ "loss": 0.0867,
906
+ "step": 420
907
+ },
908
+ {
909
+ "epoch": 68.0,
910
+ "eval_accuracy": 0.8620689655172413,
911
+ "eval_loss": 0.510515034198761,
912
+ "eval_runtime": 2.0565,
913
+ "eval_samples_per_second": 42.305,
914
+ "eval_steps_per_second": 1.459,
915
+ "step": 425
916
+ },
917
+ {
918
+ "epoch": 68.8,
919
+ "grad_norm": 15.880162239074707,
920
+ "learning_rate": 5.787037037037038e-06,
921
+ "loss": 0.071,
922
+ "step": 430
923
+ },
924
+ {
925
+ "epoch": 68.96,
926
+ "eval_accuracy": 0.8505747126436781,
927
+ "eval_loss": 0.5272470116615295,
928
+ "eval_runtime": 2.0378,
929
+ "eval_samples_per_second": 42.693,
930
+ "eval_steps_per_second": 1.472,
931
+ "step": 431
932
+ },
933
+ {
934
+ "epoch": 69.92,
935
+ "eval_accuracy": 0.8505747126436781,
936
+ "eval_loss": 0.5523571372032166,
937
+ "eval_runtime": 2.0569,
938
+ "eval_samples_per_second": 42.297,
939
+ "eval_steps_per_second": 1.459,
940
+ "step": 437
941
+ },
942
+ {
943
+ "epoch": 70.4,
944
+ "grad_norm": 14.639904975891113,
945
+ "learning_rate": 4.6296296296296296e-06,
946
+ "loss": 0.0723,
947
+ "step": 440
948
+ },
949
+ {
950
+ "epoch": 70.88,
951
+ "eval_accuracy": 0.8390804597701149,
952
+ "eval_loss": 0.5507646799087524,
953
+ "eval_runtime": 2.1114,
954
+ "eval_samples_per_second": 41.205,
955
+ "eval_steps_per_second": 1.421,
956
+ "step": 443
957
+ },
958
+ {
959
+ "epoch": 72.0,
960
+ "grad_norm": 6.164122104644775,
961
+ "learning_rate": 3.4722222222222224e-06,
962
+ "loss": 0.0748,
963
+ "step": 450
964
+ },
965
+ {
966
+ "epoch": 72.0,
967
+ "eval_accuracy": 0.8160919540229885,
968
+ "eval_loss": 0.568942129611969,
969
+ "eval_runtime": 2.0852,
970
+ "eval_samples_per_second": 41.723,
971
+ "eval_steps_per_second": 1.439,
972
+ "step": 450
973
+ },
974
+ {
975
+ "epoch": 72.96,
976
+ "eval_accuracy": 0.8505747126436781,
977
+ "eval_loss": 0.555583119392395,
978
+ "eval_runtime": 2.0316,
979
+ "eval_samples_per_second": 42.824,
980
+ "eval_steps_per_second": 1.477,
981
+ "step": 456
982
+ },
983
+ {
984
+ "epoch": 73.6,
985
+ "grad_norm": 11.653559684753418,
986
+ "learning_rate": 2.3148148148148148e-06,
987
+ "loss": 0.0589,
988
+ "step": 460
989
+ },
990
+ {
991
+ "epoch": 73.92,
992
+ "eval_accuracy": 0.8505747126436781,
993
+ "eval_loss": 0.5452274084091187,
994
+ "eval_runtime": 2.0938,
995
+ "eval_samples_per_second": 41.551,
996
+ "eval_steps_per_second": 1.433,
997
+ "step": 462
998
+ },
999
+ {
1000
+ "epoch": 74.88,
1001
+ "eval_accuracy": 0.8620689655172413,
1002
+ "eval_loss": 0.5475078225135803,
1003
+ "eval_runtime": 2.0547,
1004
+ "eval_samples_per_second": 42.342,
1005
+ "eval_steps_per_second": 1.46,
1006
+ "step": 468
1007
+ },
1008
+ {
1009
+ "epoch": 75.2,
1010
+ "grad_norm": 21.146989822387695,
1011
+ "learning_rate": 1.1574074074074074e-06,
1012
+ "loss": 0.0719,
1013
+ "step": 470
1014
+ },
1015
+ {
1016
+ "epoch": 76.0,
1017
+ "eval_accuracy": 0.8620689655172413,
1018
+ "eval_loss": 0.5483731031417847,
1019
+ "eval_runtime": 2.1022,
1020
+ "eval_samples_per_second": 41.386,
1021
+ "eval_steps_per_second": 1.427,
1022
+ "step": 475
1023
+ },
1024
+ {
1025
+ "epoch": 76.8,
1026
+ "grad_norm": 12.87066650390625,
1027
+ "learning_rate": 0.0,
1028
+ "loss": 0.0801,
1029
+ "step": 480
1030
+ },
1031
+ {
1032
+ "epoch": 76.8,
1033
+ "eval_accuracy": 0.8620689655172413,
1034
+ "eval_loss": 0.5496163368225098,
1035
+ "eval_runtime": 2.0924,
1036
+ "eval_samples_per_second": 41.58,
1037
+ "eval_steps_per_second": 1.434,
1038
+ "step": 480
1039
+ },
1040
+ {
1041
+ "epoch": 76.8,
1042
+ "step": 480,
1043
+ "total_flos": 1.514063180200919e+18,
1044
+ "train_loss": 0.449434948215882,
1045
+ "train_runtime": 1985.4684,
1046
+ "train_samples_per_second": 31.549,
1047
+ "train_steps_per_second": 0.242
1048
+ }
1049
+ ],
1050
+ "logging_steps": 10,
1051
+ "max_steps": 480,
1052
+ "num_input_tokens_seen": 0,
1053
+ "num_train_epochs": 80,
1054
+ "save_steps": 500,
1055
+ "stateful_callbacks": {
1056
+ "TrainerControl": {
1057
+ "args": {
1058
+ "should_epoch_stop": false,
1059
+ "should_evaluate": false,
1060
+ "should_log": false,
1061
+ "should_save": false,
1062
+ "should_training_stop": false
1063
+ },
1064
+ "attributes": {}
1065
+ }
1066
+ },
1067
+ "total_flos": 1.514063180200919e+18,
1068
+ "train_batch_size": 32,
1069
+ "trial_name": null,
1070
+ "trial_params": null
1071
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3035e41ea7baa5c7336ad050a3b81dcc016e5c5dc831d5e355c15b8a865ce2b3
3
+ size 5240