andrewAmani
/

results_packing

Generated from Trainer

Model card Files Files and versions Community

andrewAmani commited on Jul 4

Commit

eff5d43

•

1 Parent(s): 22c06fc

Model save

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -14,6 +14,8 @@ should probably proofread and complete it, then remove this comment. -->
 # results_packing
 This model is a fine-tuned version of [hivaze/ParaLex-Llama-3-8B-SFT](https://huggingface.co/hivaze/ParaLex-Llama-3-8B-SFT) on the None dataset.
 ## Model description
@@ -40,7 +42,19 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 17
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
-- num_epochs: 32
 ### Framework versions

 # results_packing
 This model is a fine-tuned version of [hivaze/ParaLex-Llama-3-8B-SFT](https://huggingface.co/hivaze/ParaLex-Llama-3-8B-SFT) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.3215
 ## Model description
 - total_train_batch_size: 17
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
+- num_epochs: 8
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.4022        | 1.25  | 5    | 0.3324          |
+| 0.3492        | 2.5   | 10   | 0.3161          |
+| 0.3181        | 3.75  | 15   | 0.3138          |
+| 0.2808        | 5.0   | 20   | 0.3177          |
+| 0.2571        | 6.25  | 25   | 0.3206          |
+| 0.2424        | 7.5   | 30   | 0.3215          |
 ### Framework versions