metadata

license: apache-2.0
library_name: peft
tags:
  - trl
  - sft
  - generated_from_trainer
base_model: mistralai/Mixtral-8x7B-v0.1
model-index:
  - name: results_mixtral_sft
    results: []

results_mixtral_sft

This model is a fine-tuned version of mistralai/Mixtral-8x7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2331

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 10
eval_batch_size: 10
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 20
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 25
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
No log	1.0	1	2.4533
No log	2.0	2	2.4493
No log	3.0	3	2.4436
No log	4.0	4	2.4352
No log	5.0	5	2.4249
No log	6.0	6	2.4215
No log	7.0	7	2.4047
No log	8.0	8	2.3842
No log	9.0	9	2.3561
No log	10.0	10	2.3295
No log	11.0	11	2.3004
No log	12.0	12	2.2563
No log	13.0	13	2.2130
No log	14.0	14	2.1715
No log	15.0	15	2.1203
No log	16.0	16	2.0893
No log	17.0	17	2.0458
No log	18.0	18	1.9937
No log	19.0	19	1.9469
No log	20.0	20	1.9085
No log	21.0	21	1.9413
No log	22.0	22	1.8690
No log	23.0	23	1.8139
No log	24.0	24	1.7389
1.0996	25.0	25	1.6836
1.0996	26.0	26	1.6236
1.0996	27.0	27	1.5705
1.0996	28.0	28	1.5261
1.0996	29.0	29	1.4790
1.0996	30.0	30	1.4240
1.0996	31.0	31	1.3674
1.0996	32.0	32	1.3182
1.0996	33.0	33	1.2769
1.0996	34.0	34	1.2321
1.0996	35.0	35	1.1885
1.0996	36.0	36	1.1445
1.0996	37.0	37	1.0878
1.0996	38.0	38	1.0237
1.0996	39.0	39	0.9748
1.0996	40.0	40	0.9294
1.0996	41.0	41	0.8806
1.0996	42.0	42	0.8457
1.0996	43.0	43	0.7969
1.0996	44.0	44	0.7599
1.0996	45.0	45	0.7189
1.0996	46.0	46	0.6952
1.0996	47.0	47	0.6570
1.0996	48.0	48	0.6316
1.0996	49.0	49	0.6212
0.548	50.0	50	0.5764
0.548	51.0	51	0.5113
0.548	52.0	52	0.4868
0.548	53.0	53	0.4585
0.548	54.0	54	0.4334
0.548	55.0	55	0.4208
0.548	56.0	56	0.4087
0.548	57.0	57	0.3945
0.548	58.0	58	0.3722
0.548	59.0	59	0.3588
0.548	60.0	60	0.3414
0.548	61.0	61	0.3235
0.548	62.0	62	0.3157
0.548	63.0	63	0.3050
0.548	64.0	64	0.2969
0.548	65.0	65	0.2893
0.548	66.0	66	0.2802
0.548	67.0	67	0.2746
0.548	68.0	68	0.2688
0.548	69.0	69	0.2643
0.548	70.0	70	0.2581
0.548	71.0	71	0.2523
0.548	72.0	72	0.2490
0.548	73.0	73	0.2468
0.548	74.0	74	0.2404
0.1741	75.0	75	0.2394
0.1741	76.0	76	0.2382
0.1741	77.0	77	0.2373
0.1741	78.0	78	0.2366
0.1741	79.0	79	0.2361
0.1741	80.0	80	0.2358
0.1741	81.0	81	0.2355
0.1741	82.0	82	0.2352
0.1741	83.0	83	0.2350
0.1741	84.0	84	0.2348
0.1741	85.0	85	0.2345
0.1741	86.0	86	0.2343
0.1741	87.0	87	0.2342
0.1741	88.0	88	0.2340
0.1741	89.0	89	0.2339
0.1741	90.0	90	0.2337
0.1741	91.0	91	0.2336
0.1741	92.0	92	0.2335
0.1741	93.0	93	0.2334
0.1741	94.0	94	0.2333
0.1741	95.0	95	0.2333
0.1741	96.0	96	0.2332
0.1741	97.0	97	0.2331
0.1741	98.0	98	0.2331
0.1741	99.0	99	0.2331
0.1174	100.0	100	0.2331

Framework versions

PEFT 0.8.1
Transformers 4.37.2
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1