AswanthCManoj
/

results

PEFT

Safetensors

Generated from Trainer

Model card Files Files and versions Community

AswanthCManoj commited on Jan 14

Commit

40d2fa9

•

1 Parent(s): 6603e4b

azma-deepseek-coder-1.3b-instruct-structured-output

Browse files

Files changed (1) hide show

README.md +11 -23

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.1804
 ## Model description
@@ -37,7 +37,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2.5e-05
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
@@ -47,33 +47,21 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
 - lr_scheduler_warmup_steps: 50
-- training_steps: 500
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 1.8004        | 0.01  | 25   | 1.8163          |
-| 1.8977        | 0.01  | 50   | 1.6053          |
-| 1.5414        | 0.02  | 75   | 1.4712          |
-| 0.9012        | 0.02  | 100  | 1.3793          |
-| 1.2741        | 0.03  | 125  | 1.3288          |
-| 0.7702        | 0.04  | 150  | 1.2922          |
-| 1.3633        | 0.04  | 175  | 1.2580          |
-| 0.5385        | 0.05  | 200  | 1.2341          |
-| 1.3172        | 0.06  | 225  | 1.2157          |
-| 0.5901        | 0.06  | 250  | 1.2088          |
-| 1.2612        | 0.07  | 275  | 1.1989          |
-| 0.6468        | 0.07  | 300  | 1.1937          |
-| 1.2991        | 0.08  | 325  | 1.1891          |
-| 0.5557        | 0.09  | 350  | 1.1858          |
-| 1.148         | 0.09  | 375  | 1.1838          |
-| 0.634         | 0.1   | 400  | 1.1823          |
-| 1.2453        | 0.1   | 425  | 1.1812          |
-| 0.5921        | 0.11  | 450  | 1.1807          |
-| 1.0155        | 0.12  | 475  | 1.1805          |
-| 0.6354        | 0.12  | 500  | 1.1804          |
 ### Framework versions

 This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.1485
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
 - train_batch_size: 4
 - eval_batch_size: 4
 - seed: 42
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
 - lr_scheduler_warmup_steps: 50
+- training_steps: 200
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 1.3759        | 0.02  | 25   | 1.3449          |
+| 0.5848        | 0.03  | 50   | 1.2507          |
+| 1.0184        | 0.05  | 75   | 1.1688          |
+| 0.5275        | 0.07  | 100  | 1.1849          |
+| 0.9792        | 0.08  | 125  | 1.1529          |
+| 0.5695        | 0.1   | 150  | 1.1572          |
+| 0.8567        | 0.11  | 175  | 1.1495          |
+| 0.5234        | 0.13  | 200  | 1.1485          |
 ### Framework versions