File size: 2,675 Bytes
6c053f3
66e7346
 
6c053f3
cb65242
6c053f3
cb65242
66e7346
cb65242
66e7346
6c053f3
66e7346
6c053f3
 
 
66e7346
 
6c053f3
 
 
 
66e7346
9c95ccd
66e7346
 
 
 
 
 
 
 
 
 
6c053f3
 
 
 
 
b561367
66e7346
 
 
 
 
 
6c053f3
 
66e7346
 
fc720f3
6362453
 
 
6c053f3
a074a16
6c053f3
 
 
 
66e7346
 
6c053f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: mit
base_model: ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2
tags:
- alignment-handbook
- dpo
- trl
- selm
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: SELM-Llama-3-8B-Instruct-iter-3
  results: []
---



<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->



[Self-Exploring Language Models: Active Preference Elicitation for Online Alignment](https://arxiv.org/abs/2405.19332).



# SELM-Llama-3-8B-Instruct-iter-3



This model is a fine-tuned version of [ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.



## Model description



- Model type: A 8B parameter Llama3-instruct-based Self-Exploring Language Models (SELM).
- License: MIT



## Results



|                                        | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
|----------------------------------------|------------------------|--------------------|
| [SELM-Llama-3-8B-Instruct-iter-3](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-3)  |    &emsp; &emsp; &emsp;&emsp;           33.47          |   &emsp; &emsp; &emsp;         8.29       |
| [SELM-Llama-3-8B-Instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2) |    &emsp; &emsp; &emsp;&emsp;         35.65         |  &emsp; &emsp; &emsp;         8.09       |
| [SELM-Llama-3-8B-Instruct-iter-1](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-1) |    &emsp; &emsp; &emsp;&emsp;         32.02         |  &emsp; &emsp; &emsp;         7.92       |
| [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)  |    &emsp; &emsp; &emsp;&emsp;         24.31         |  &emsp; &emsp; &emsp;         7.93       |

Our model also ranks highly on [WildBench](https://huggingface.co/spaces/allenai/WildBench)! 🔥

### Training hyperparameters

The following hyperparameters were used during training:
- alpha: 0.0001
- beta: 0.01
- train_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1

### Framework versions

- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1