Edit model card

Llama-31-8B_task-1_120-samples_config-4

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-1 and the GaetanMichelet/chat-120_ft_task-1 datasets. It achieves the following results on the evaluation set:

  • Loss: 1.2635

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 150

Training results

Training Loss Epoch Step Validation Loss
2.121 0.9091 5 2.1020
2.0709 2.0 11 2.0931
2.0454 2.9091 16 2.0755
2.0502 4.0 22 2.0472
2.0511 4.9091 27 2.0100
1.9554 6.0 33 1.9472
1.8921 6.9091 38 1.8795
1.8104 8.0 44 1.7813
1.7636 8.9091 49 1.6937
1.6011 10.0 55 1.6142
1.5128 10.9091 60 1.5751
1.4277 12.0 66 1.5353
1.4998 12.9091 71 1.5001
1.4154 14.0 77 1.4583
1.4201 14.9091 82 1.4252
1.3364 16.0 88 1.3921
1.2762 16.9091 93 1.3691
1.2851 18.0 99 1.3437
1.2239 18.9091 104 1.3261
1.221 20.0 110 1.3084
1.2011 20.9091 115 1.2951
1.1433 22.0 121 1.2824
1.1579 22.9091 126 1.2746
1.0871 24.0 132 1.2680
1.0745 24.9091 137 1.2635
1.0006 26.0 143 1.2674
0.9628 26.9091 148 1.2689
0.9237 28.0 154 1.2717
0.8824 28.9091 159 1.2880
0.8706 30.0 165 1.2961
0.8328 30.9091 170 1.3266
0.7667 32.0 176 1.3447

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for GaetanMichelet/Llama-31-8B_task-1_120-samples_config-4

Adapter
(451)
this model

Collection including GaetanMichelet/Llama-31-8B_task-1_120-samples_config-4