End of training
Browse files- README.md +3 -2
- logs/learning_rate=0.0002, lr_scheduler_kwargs=__power___0.7___lr_end___2e-05_, lr_scheduler_type=polynomial, per_device_train_batch_size=8, warmup_ratio=0.1/events.out.tfevents.1726983816.1c1a426a2fee +2 -2
- logs/learning_rate=0.0002, lr_scheduler_kwargs=__power___0.7___lr_end___2e-05_, lr_scheduler_type=polynomial, per_device_train_batch_size=8, warmup_ratio=0.1/events.out.tfevents.1727015278.1c1a426a2fee +3 -0
- model.safetensors +1 -1
README.md
CHANGED
@@ -150,6 +150,7 @@ The following hyperparameters were used during training:
|
|
150 |
- seed: `42`
|
151 |
- optimizer: `Adam with betas=(0.9,0.999) and epsilon=1e-08`
|
152 |
- lr_scheduler_type: `polynomial`
|
|
|
153 |
- num_epochs: `1.0`
|
154 |
- distillation_objective: `DistillationObjective(
|
155 |
logits_loss_component=LossComponent(
|
@@ -163,7 +164,7 @@ The following hyperparameters were used during training:
|
|
163 |
weight=0
|
164 |
)
|
165 |
)`
|
166 |
-
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at
|
167 |
- student_model_name_or_path: `None`
|
168 |
- student_config_name_or_path: `None`
|
169 |
- student_model_config: `{'num_hidden_layers': 15}`
|
@@ -187,7 +188,7 @@ The following hyperparameters were used during training:
|
|
187 |
- gradient_accumulation_steps: `1`
|
188 |
- weight_decay: `0.0`
|
189 |
- max_grad_norm: `1.0`
|
190 |
-
- warmup_ratio: `0.
|
191 |
- warmup_steps: `0`
|
192 |
- gradient_checkpointing: `True`
|
193 |
|
|
|
150 |
- seed: `42`
|
151 |
- optimizer: `Adam with betas=(0.9,0.999) and epsilon=1e-08`
|
152 |
- lr_scheduler_type: `polynomial`
|
153 |
+
- lr_scheduler_warmup_ratio: `0.1`
|
154 |
- num_epochs: `1.0`
|
155 |
- distillation_objective: `DistillationObjective(
|
156 |
logits_loss_component=LossComponent(
|
|
|
164 |
weight=0
|
165 |
)
|
166 |
)`
|
167 |
+
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x778665672650>`
|
168 |
- student_model_name_or_path: `None`
|
169 |
- student_config_name_or_path: `None`
|
170 |
- student_model_config: `{'num_hidden_layers': 15}`
|
|
|
188 |
- gradient_accumulation_steps: `1`
|
189 |
- weight_decay: `0.0`
|
190 |
- max_grad_norm: `1.0`
|
191 |
+
- warmup_ratio: `0.1`
|
192 |
- warmup_steps: `0`
|
193 |
- gradient_checkpointing: `True`
|
194 |
|
logs/learning_rate=0.0002, lr_scheduler_kwargs=__power___0.7___lr_end___2e-05_, lr_scheduler_type=polynomial, per_device_train_batch_size=8, warmup_ratio=0.1/events.out.tfevents.1726983816.1c1a426a2fee
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5119150f80c74b99fe989d96f21849facb83c524f54e9dfe779272cf3c683c68
|
3 |
+
size 3432096
|
logs/learning_rate=0.0002, lr_scheduler_kwargs=__power___0.7___lr_end___2e-05_, lr_scheduler_type=polynomial, per_device_train_batch_size=8, warmup_ratio=0.1/events.out.tfevents.1727015278.1c1a426a2fee
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2535e906d8815196c7061bebbef993d7a2bb3aa7cdc5757627fd6b0092587ddd
|
3 |
+
size 529
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 325669528
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e7916a52bfee88fb1994cf3ebd4fa09fcdee6d685846d2f5540cb62504e44c8c
|
3 |
size 325669528
|