results_mixtral_sft / README.md
francesco12357's picture
francesco12357/mixtral_fine_tuned_test
9d1f853 verified
metadata
license: apache-2.0
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
base_model: mistralai/Mixtral-8x7B-v0.1
model-index:
  - name: results_mixtral_sft
    results: []

results_mixtral_sft

This model is a fine-tuned version of mistralai/Mixtral-8x7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2331

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 10
  • eval_batch_size: 10
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 20
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 25
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 1 2.4533
No log 2.0 2 2.4493
No log 3.0 3 2.4436
No log 4.0 4 2.4352
No log 5.0 5 2.4249
No log 6.0 6 2.4215
No log 7.0 7 2.4047
No log 8.0 8 2.3842
No log 9.0 9 2.3561
No log 10.0 10 2.3295
No log 11.0 11 2.3004
No log 12.0 12 2.2563
No log 13.0 13 2.2130
No log 14.0 14 2.1715
No log 15.0 15 2.1203
No log 16.0 16 2.0893
No log 17.0 17 2.0458
No log 18.0 18 1.9937
No log 19.0 19 1.9469
No log 20.0 20 1.9085
No log 21.0 21 1.9413
No log 22.0 22 1.8690
No log 23.0 23 1.8139
No log 24.0 24 1.7389
1.0996 25.0 25 1.6836
1.0996 26.0 26 1.6236
1.0996 27.0 27 1.5705
1.0996 28.0 28 1.5261
1.0996 29.0 29 1.4790
1.0996 30.0 30 1.4240
1.0996 31.0 31 1.3674
1.0996 32.0 32 1.3182
1.0996 33.0 33 1.2769
1.0996 34.0 34 1.2321
1.0996 35.0 35 1.1885
1.0996 36.0 36 1.1445
1.0996 37.0 37 1.0878
1.0996 38.0 38 1.0237
1.0996 39.0 39 0.9748
1.0996 40.0 40 0.9294
1.0996 41.0 41 0.8806
1.0996 42.0 42 0.8457
1.0996 43.0 43 0.7969
1.0996 44.0 44 0.7599
1.0996 45.0 45 0.7189
1.0996 46.0 46 0.6952
1.0996 47.0 47 0.6570
1.0996 48.0 48 0.6316
1.0996 49.0 49 0.6212
0.548 50.0 50 0.5764
0.548 51.0 51 0.5113
0.548 52.0 52 0.4868
0.548 53.0 53 0.4585
0.548 54.0 54 0.4334
0.548 55.0 55 0.4208
0.548 56.0 56 0.4087
0.548 57.0 57 0.3945
0.548 58.0 58 0.3722
0.548 59.0 59 0.3588
0.548 60.0 60 0.3414
0.548 61.0 61 0.3235
0.548 62.0 62 0.3157
0.548 63.0 63 0.3050
0.548 64.0 64 0.2969
0.548 65.0 65 0.2893
0.548 66.0 66 0.2802
0.548 67.0 67 0.2746
0.548 68.0 68 0.2688
0.548 69.0 69 0.2643
0.548 70.0 70 0.2581
0.548 71.0 71 0.2523
0.548 72.0 72 0.2490
0.548 73.0 73 0.2468
0.548 74.0 74 0.2404
0.1741 75.0 75 0.2394
0.1741 76.0 76 0.2382
0.1741 77.0 77 0.2373
0.1741 78.0 78 0.2366
0.1741 79.0 79 0.2361
0.1741 80.0 80 0.2358
0.1741 81.0 81 0.2355
0.1741 82.0 82 0.2352
0.1741 83.0 83 0.2350
0.1741 84.0 84 0.2348
0.1741 85.0 85 0.2345
0.1741 86.0 86 0.2343
0.1741 87.0 87 0.2342
0.1741 88.0 88 0.2340
0.1741 89.0 89 0.2339
0.1741 90.0 90 0.2337
0.1741 91.0 91 0.2336
0.1741 92.0 92 0.2335
0.1741 93.0 93 0.2334
0.1741 94.0 94 0.2333
0.1741 95.0 95 0.2333
0.1741 96.0 96 0.2332
0.1741 97.0 97 0.2331
0.1741 98.0 98 0.2331
0.1741 99.0 99 0.2331
0.1174 100.0 100 0.2331

Framework versions

  • PEFT 0.8.1
  • Transformers 4.37.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1