dominic5's picture
Model save
a114c2d verified
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
datasets:
  - generator
base_model: mistralai/Mixtral-8x7B-v0.1
model-index:
  - name: Project5_V3_Mistral8x7b_V2.2.5
    results: []

Project5_V3_Mistral8x7b_V2.2.5

This model is a fine-tuned version of mistralai/Mixtral-8x7B-v0.1 on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2430

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 40

Training results

Training Loss Epoch Step Validation Loss
No log 0.63 1 2.3024
No log 1.88 3 2.2997
No log 2.51 4 2.2977
2.2951 3.76 6 2.2935
2.2951 4.39 7 2.2905
2.2951 5.65 9 2.2853
2.2845 6.9 11 2.2789
2.2845 7.53 12 2.2757
2.2845 8.78 14 2.2696
2.2729 9.41 15 2.2667
2.2729 10.67 17 2.2619
2.2729 11.92 19 2.2576
2.2599 12.55 20 2.2555
2.2599 13.8 22 2.2519
2.2599 14.43 23 2.2504
2.2523 15.69 25 2.2479
2.2523 16.94 27 2.2462
2.2523 17.57 28 2.2454
2.2471 18.82 30 2.2442
2.2471 19.45 31 2.2437
2.2471 20.71 33 2.2432
2.2444 21.96 35 2.2427
2.2444 22.59 36 2.2428
2.2444 23.84 38 2.2429
2.2444 24.47 39 2.2429
2.2427 25.1 40 2.2430

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.1+cu121
  • Datasets 2.17.1
  • Tokenizers 0.15.2