File size: 4,767 Bytes
d045310
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35bbf69
 
 
 
 
 
 
 
 
d045310
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35bbf69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d045310
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: apache-2.0
base_model: AmberYifan/mistral-safe-sft-full
tags:
- generated_from_trainer
model-index:
- name: mistral-sft4epoch-spin-v
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-sft4epoch-spin-v

This model is a fine-tuned version of [AmberYifan/mistral-safe-sft-full](https://huggingface.co/AmberYifan/mistral-safe-sft-full) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2284
- Rewards/real: 10.1344
- Rewards/generated: -5.3158
- Rewards/accuracies: 1.0
- Rewards/margins: 15.4503
- Logps/generated: -131.8755
- Logps/real: -111.3366
- Logits/generated: -2.7694
- Logits/real: -2.7499

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
|:-------------:|:------:|:----:|:---------------:|:------------:|:-----------------:|:------------------:|:---------------:|:---------------:|:----------:|:----------------:|:-----------:|
| 0.278         | 0.0640 | 100  | 0.2703          | 8.6366       | -3.4251           | 0.9922             | 12.0617         | -112.9675       | -126.3148  | -2.9055          | -2.8963     |
| 0.2283        | 0.1280 | 200  | 0.2438          | 9.5699       | -4.6271           | 0.9922             | 14.1970         | -124.9880       | -116.9817  | -2.8308          | -2.8192     |
| 0.2284        | 0.1919 | 300  | 0.2384          | 9.7849       | -5.0781           | 0.9922             | 14.8630         | -129.4981       | -114.8321  | -2.8396          | -2.8204     |
| 0.2154        | 0.2559 | 400  | 0.2361          | 9.8971       | -4.8914           | 0.9922             | 14.7885         | -127.6311       | -113.7101  | -2.8303          | -2.8085     |
| 0.2368        | 0.3199 | 500  | 0.2351          | 9.9762       | -5.0488           | 0.9922             | 15.0249         | -129.2045       | -112.9195  | -2.8228          | -2.8083     |
| 0.2065        | 0.3839 | 600  | 0.2346          | 10.0426      | -4.9610           | 0.9922             | 15.0035         | -128.3267       | -112.2554  | -2.8204          | -2.8086     |
| 0.2244        | 0.4479 | 700  | 0.2317          | 10.0417      | -5.1299           | 1.0                | 15.1716         | -130.0162       | -112.2640  | -2.8203          | -2.8076     |
| 0.2161        | 0.5118 | 800  | 0.2297          | 10.0737      | -5.0565           | 1.0                | 15.1303         | -129.2824       | -111.9440  | -2.8437          | -2.8337     |
| 0.2127        | 0.5758 | 900  | 0.2302          | 10.0913      | -5.0905           | 1.0                | 15.1818         | -129.6217       | -111.7683  | -2.8251          | -2.8150     |
| 0.2017        | 0.6398 | 1000 | 0.2298          | 10.1245      | -5.2627           | 1.0                | 15.3872         | -131.3441       | -111.4362  | -2.7955          | -2.7831     |
| 0.2152        | 0.7038 | 1100 | 0.2297          | 10.0889      | -5.3503           | 1.0                | 15.4392         | -132.2204       | -111.7925  | -2.7790          | -2.7609     |
| 0.2074        | 0.7678 | 1200 | 0.2298          | 10.1143      | -5.3204           | 1.0                | 15.4346         | -131.9209       | -111.5385  | -2.7919          | -2.7734     |
| 0.2107        | 0.8317 | 1300 | 0.2287          | 10.1349      | -5.3137           | 1.0                | 15.4486         | -131.8539       | -111.3324  | -2.7734          | -2.7524     |
| 0.1947        | 0.8957 | 1400 | 0.2288          | 10.1265      | -5.3252           | 1.0                | 15.4517         | -131.9686       | -111.4160  | -2.7803          | -2.7613     |
| 0.2056        | 0.9597 | 1500 | 0.2284          | 10.1344      | -5.3158           | 1.0                | 15.4503         | -131.8755       | -111.3366  | -2.7694          | -2.7499     |


### Framework versions

- Transformers 4.43.3
- Pytorch 2.2.2+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1