End of training
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ model-index:
|
|
20 |
metrics:
|
21 |
- name: Bleu
|
22 |
type: bleu
|
23 |
-
value: 1.
|
24 |
---
|
25 |
|
26 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -30,9 +30,9 @@ should probably proofread and complete it, then remove this comment. -->
|
|
30 |
|
31 |
This model was trained from scratch on the opus_books dataset.
|
32 |
It achieves the following results on the evaluation set:
|
33 |
-
- Loss: 2.
|
34 |
-
- Bleu: 1.
|
35 |
-
- Gen Len: 18.
|
36 |
|
37 |
## Model description
|
38 |
|
@@ -57,23 +57,113 @@ The following hyperparameters were used during training:
|
|
57 |
- seed: 42
|
58 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
59 |
- lr_scheduler_type: linear
|
60 |
-
- num_epochs:
|
61 |
- mixed_precision_training: Native AMP
|
62 |
|
63 |
### Training results
|
64 |
|
65 |
| Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
|
66 |
|:-------------:|:-----:|:----:|:---------------:|:------:|:-------:|
|
67 |
-
| No log | 1.0 | 53 |
|
68 |
-
| No log | 2.0 | 106 | 2.
|
69 |
-
| No log | 3.0 | 159 | 2.
|
70 |
-
| No log | 4.0 | 212 | 2.
|
71 |
-
| No log | 5.0 | 265 | 2.
|
72 |
-
| No log | 6.0 | 318 | 2.
|
73 |
-
| No log | 7.0 | 371 | 2.
|
74 |
-
| No log | 8.0 | 424 | 2.
|
75 |
-
| No log | 9.0 | 477 | 2.
|
76 |
-
| 3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
|
79 |
### Framework versions
|
|
|
20 |
metrics:
|
21 |
- name: Bleu
|
22 |
type: bleu
|
23 |
+
value: 1.5414
|
24 |
---
|
25 |
|
26 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
30 |
|
31 |
This model was trained from scratch on the opus_books dataset.
|
32 |
It achieves the following results on the evaluation set:
|
33 |
+
- Loss: 2.4043
|
34 |
+
- Bleu: 1.5414
|
35 |
+
- Gen Len: 18.3803
|
36 |
|
37 |
## Model description
|
38 |
|
|
|
57 |
- seed: 42
|
58 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
59 |
- lr_scheduler_type: linear
|
60 |
+
- num_epochs: 100
|
61 |
- mixed_precision_training: Native AMP
|
62 |
|
63 |
### Training results
|
64 |
|
65 |
| Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
|
66 |
|:-------------:|:-----:|:----:|:---------------:|:------:|:-------:|
|
67 |
+
| No log | 1.0 | 53 | 2.8753 | 1.3055 | 18.3192 |
|
68 |
+
| No log | 2.0 | 106 | 2.8517 | 1.3879 | 18.3239 |
|
69 |
+
| No log | 3.0 | 159 | 2.8330 | 1.4455 | 18.3286 |
|
70 |
+
| No log | 4.0 | 212 | 2.8172 | 1.4054 | 18.3803 |
|
71 |
+
| No log | 5.0 | 265 | 2.8011 | 1.4365 | 18.3709 |
|
72 |
+
| No log | 6.0 | 318 | 2.7803 | 1.4315 | 18.3474 |
|
73 |
+
| No log | 7.0 | 371 | 2.7683 | 1.4768 | 18.3286 |
|
74 |
+
| No log | 8.0 | 424 | 2.7552 | 1.5171 | 18.3192 |
|
75 |
+
| No log | 9.0 | 477 | 2.7394 | 1.488 | 18.3474 |
|
76 |
+
| 3.0631 | 10.0 | 530 | 2.7270 | 1.5307 | 18.385 |
|
77 |
+
| 3.0631 | 11.0 | 583 | 2.7156 | 1.505 | 18.3005 |
|
78 |
+
| 3.0631 | 12.0 | 636 | 2.7000 | 1.3708 | 18.3146 |
|
79 |
+
| 3.0631 | 13.0 | 689 | 2.6914 | 1.3796 | 18.3192 |
|
80 |
+
| 3.0631 | 14.0 | 742 | 2.6818 | 1.4616 | 18.3005 |
|
81 |
+
| 3.0631 | 15.0 | 795 | 2.6728 | 1.4487 | 18.3005 |
|
82 |
+
| 3.0631 | 16.0 | 848 | 2.6596 | 1.3979 | 18.2911 |
|
83 |
+
| 3.0631 | 17.0 | 901 | 2.6506 | 1.4544 | 18.3099 |
|
84 |
+
| 3.0631 | 18.0 | 954 | 2.6381 | 1.3779 | 18.3239 |
|
85 |
+
| 2.9232 | 19.0 | 1007 | 2.6313 | 1.4275 | 18.3052 |
|
86 |
+
| 2.9232 | 20.0 | 1060 | 2.6223 | 1.4489 | 18.3521 |
|
87 |
+
| 2.9232 | 21.0 | 1113 | 2.6139 | 1.4473 | 18.3803 |
|
88 |
+
| 2.9232 | 22.0 | 1166 | 2.6058 | 1.4407 | 18.3333 |
|
89 |
+
| 2.9232 | 23.0 | 1219 | 2.5985 | 1.4594 | 18.3192 |
|
90 |
+
| 2.9232 | 24.0 | 1272 | 2.5899 | 1.4473 | 18.2911 |
|
91 |
+
| 2.9232 | 25.0 | 1325 | 2.5832 | 1.4717 | 18.3521 |
|
92 |
+
| 2.9232 | 26.0 | 1378 | 2.5752 | 1.4282 | 18.3333 |
|
93 |
+
| 2.9232 | 27.0 | 1431 | 2.5699 | 1.3598 | 18.3239 |
|
94 |
+
| 2.9232 | 28.0 | 1484 | 2.5628 | 1.409 | 18.3286 |
|
95 |
+
| 2.807 | 29.0 | 1537 | 2.5577 | 1.3461 | 18.3568 |
|
96 |
+
| 2.807 | 30.0 | 1590 | 2.5524 | 1.425 | 18.3803 |
|
97 |
+
| 2.807 | 31.0 | 1643 | 2.5449 | 1.3638 | 18.3615 |
|
98 |
+
| 2.807 | 32.0 | 1696 | 2.5413 | 1.3604 | 18.3897 |
|
99 |
+
| 2.807 | 33.0 | 1749 | 2.5380 | 1.5423 | 18.3991 |
|
100 |
+
| 2.807 | 34.0 | 1802 | 2.5335 | 1.5392 | 18.3944 |
|
101 |
+
| 2.807 | 35.0 | 1855 | 2.5266 | 1.4923 | 18.3474 |
|
102 |
+
| 2.807 | 36.0 | 1908 | 2.5210 | 1.445 | 18.3192 |
|
103 |
+
| 2.807 | 37.0 | 1961 | 2.5151 | 1.453 | 18.3521 |
|
104 |
+
| 2.7147 | 38.0 | 2014 | 2.5113 | 1.4277 | 18.3286 |
|
105 |
+
| 2.7147 | 39.0 | 2067 | 2.5093 | 1.4015 | 18.3568 |
|
106 |
+
| 2.7147 | 40.0 | 2120 | 2.5033 | 1.4314 | 18.3615 |
|
107 |
+
| 2.7147 | 41.0 | 2173 | 2.4992 | 1.3861 | 18.3803 |
|
108 |
+
| 2.7147 | 42.0 | 2226 | 2.4961 | 1.4661 | 18.385 |
|
109 |
+
| 2.7147 | 43.0 | 2279 | 2.4933 | 1.4569 | 18.3803 |
|
110 |
+
| 2.7147 | 44.0 | 2332 | 2.4887 | 1.5818 | 18.3803 |
|
111 |
+
| 2.7147 | 45.0 | 2385 | 2.4863 | 1.5672 | 18.3803 |
|
112 |
+
| 2.7147 | 46.0 | 2438 | 2.4807 | 1.5475 | 18.3568 |
|
113 |
+
| 2.7147 | 47.0 | 2491 | 2.4790 | 1.4686 | 18.3568 |
|
114 |
+
| 2.6478 | 48.0 | 2544 | 2.4742 | 1.5072 | 18.3615 |
|
115 |
+
| 2.6478 | 49.0 | 2597 | 2.4720 | 1.6371 | 18.3897 |
|
116 |
+
| 2.6478 | 50.0 | 2650 | 2.4690 | 1.5358 | 18.3239 |
|
117 |
+
| 2.6478 | 51.0 | 2703 | 2.4663 | 1.5322 | 18.3239 |
|
118 |
+
| 2.6478 | 52.0 | 2756 | 2.4630 | 1.5193 | 18.3427 |
|
119 |
+
| 2.6478 | 53.0 | 2809 | 2.4590 | 1.5162 | 18.3333 |
|
120 |
+
| 2.6478 | 54.0 | 2862 | 2.4565 | 1.5365 | 18.3239 |
|
121 |
+
| 2.6478 | 55.0 | 2915 | 2.4535 | 1.5086 | 18.3709 |
|
122 |
+
| 2.6478 | 56.0 | 2968 | 2.4514 | 1.5211 | 18.3521 |
|
123 |
+
| 2.5967 | 57.0 | 3021 | 2.4499 | 1.5442 | 18.3709 |
|
124 |
+
| 2.5967 | 58.0 | 3074 | 2.4483 | 1.5441 | 18.3709 |
|
125 |
+
| 2.5967 | 59.0 | 3127 | 2.4456 | 1.5288 | 18.3709 |
|
126 |
+
| 2.5967 | 60.0 | 3180 | 2.4419 | 1.4669 | 18.3897 |
|
127 |
+
| 2.5967 | 61.0 | 3233 | 2.4409 | 1.4707 | 18.3756 |
|
128 |
+
| 2.5967 | 62.0 | 3286 | 2.4394 | 1.5037 | 18.385 |
|
129 |
+
| 2.5967 | 63.0 | 3339 | 2.4371 | 1.5251 | 18.3709 |
|
130 |
+
| 2.5967 | 64.0 | 3392 | 2.4334 | 1.4897 | 18.3991 |
|
131 |
+
| 2.5967 | 65.0 | 3445 | 2.4326 | 1.5373 | 18.385 |
|
132 |
+
| 2.5967 | 66.0 | 3498 | 2.4326 | 1.5174 | 18.3944 |
|
133 |
+
| 2.5514 | 67.0 | 3551 | 2.4292 | 1.5326 | 18.3803 |
|
134 |
+
| 2.5514 | 68.0 | 3604 | 2.4291 | 1.5224 | 18.3709 |
|
135 |
+
| 2.5514 | 69.0 | 3657 | 2.4264 | 1.4945 | 18.3709 |
|
136 |
+
| 2.5514 | 70.0 | 3710 | 2.4238 | 1.5155 | 18.385 |
|
137 |
+
| 2.5514 | 71.0 | 3763 | 2.4220 | 1.556 | 18.3803 |
|
138 |
+
| 2.5514 | 72.0 | 3816 | 2.4214 | 1.5782 | 18.385 |
|
139 |
+
| 2.5514 | 73.0 | 3869 | 2.4197 | 1.6084 | 18.3709 |
|
140 |
+
| 2.5514 | 74.0 | 3922 | 2.4184 | 1.5642 | 18.3709 |
|
141 |
+
| 2.5514 | 75.0 | 3975 | 2.4185 | 1.6182 | 18.3897 |
|
142 |
+
| 2.5176 | 76.0 | 4028 | 2.4169 | 1.5632 | 18.3756 |
|
143 |
+
| 2.5176 | 77.0 | 4081 | 2.4139 | 1.5853 | 18.385 |
|
144 |
+
| 2.5176 | 78.0 | 4134 | 2.4136 | 1.5852 | 18.3897 |
|
145 |
+
| 2.5176 | 79.0 | 4187 | 2.4128 | 1.5608 | 18.3897 |
|
146 |
+
| 2.5176 | 80.0 | 4240 | 2.4123 | 1.5707 | 18.3897 |
|
147 |
+
| 2.5176 | 81.0 | 4293 | 2.4109 | 1.5622 | 18.3944 |
|
148 |
+
| 2.5176 | 82.0 | 4346 | 2.4104 | 1.5608 | 18.3803 |
|
149 |
+
| 2.5176 | 83.0 | 4399 | 2.4101 | 1.561 | 18.3803 |
|
150 |
+
| 2.5176 | 84.0 | 4452 | 2.4097 | 1.56 | 18.3944 |
|
151 |
+
| 2.497 | 85.0 | 4505 | 2.4096 | 1.5644 | 18.3944 |
|
152 |
+
| 2.497 | 86.0 | 4558 | 2.4075 | 1.5636 | 18.4038 |
|
153 |
+
| 2.497 | 87.0 | 4611 | 2.4073 | 1.5779 | 18.3944 |
|
154 |
+
| 2.497 | 88.0 | 4664 | 2.4069 | 1.5611 | 18.3944 |
|
155 |
+
| 2.497 | 89.0 | 4717 | 2.4068 | 1.5827 | 18.3944 |
|
156 |
+
| 2.497 | 90.0 | 4770 | 2.4063 | 1.558 | 18.3944 |
|
157 |
+
| 2.497 | 91.0 | 4823 | 2.4057 | 1.533 | 18.3944 |
|
158 |
+
| 2.497 | 92.0 | 4876 | 2.4050 | 1.5271 | 18.3944 |
|
159 |
+
| 2.497 | 93.0 | 4929 | 2.4048 | 1.5655 | 18.4038 |
|
160 |
+
| 2.497 | 94.0 | 4982 | 2.4049 | 1.5351 | 18.3803 |
|
161 |
+
| 2.4847 | 95.0 | 5035 | 2.4045 | 1.5411 | 18.3803 |
|
162 |
+
| 2.4847 | 96.0 | 5088 | 2.4046 | 1.5468 | 18.3803 |
|
163 |
+
| 2.4847 | 97.0 | 5141 | 2.4046 | 1.5474 | 18.3803 |
|
164 |
+
| 2.4847 | 98.0 | 5194 | 2.4045 | 1.5468 | 18.3803 |
|
165 |
+
| 2.4847 | 99.0 | 5247 | 2.4044 | 1.5468 | 18.3803 |
|
166 |
+
| 2.4847 | 100.0 | 5300 | 2.4043 | 1.5414 | 18.3803 |
|
167 |
|
168 |
|
169 |
### Framework versions
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 242041896
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7f878daf8d74eca3e410b8c45213378200375cc12295a3c6839083669380ae77
|
3 |
size 242041896
|
runs/Dec17_12-11-55_f20db7578e83/events.out.tfevents.1702815116.f20db7578e83.6609.6
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:29ac222ff941053a0a3c98d11cc1d06243e1c98ec23d77d43c46d81bcf535ff9
|
3 |
+
size 44309
|