GGUF
alignment-handbook
Generated from Trainer
Inference Endpoints
conversational
mav23 commited on
Commit
047800d
1 Parent(s): 701d80c

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +138 -0
  3. gemma-2-9b-it-simpo.Q4_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ gemma-2-9b-it-simpo.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-2-9b-it
3
+ tags:
4
+ - alignment-handbook
5
+ - generated_from_trainer
6
+ datasets:
7
+ - princeton-nlp/gemma2-ultrafeedback-armorm
8
+ model-index:
9
+ - name: princeton-nlp/gemma-2-9b-it-SimPO
10
+ results: []
11
+ license: mit
12
+ ---
13
+
14
+ # gemma-2-9b-it-SimPO Model Card
15
+
16
+ SimPO (Simple Preference Optimization) is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our [preprint](https://arxiv.org/pdf/2405.14734) and [github repo](https://github.com/princeton-nlp/SimPO) for more details.
17
+
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ We fine-tuned [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) on [princeton-nlp/gemma2-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) with the SimPO objective.
24
+
25
+ - **Developed by:** Yu Meng, Mengzhou Xia, Danqi Chen
26
+ - **Model type:** Causal Language Model
27
+ - **License:** gemma
28
+ - **Finetuned from model:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
29
+
30
+ ### Model Sources
31
+
32
+ <!-- Provide the basic links for the model. -->
33
+
34
+ - **Repository:** https://github.com/princeton-nlp/SimPO
35
+ - **Paper:** https://arxiv.org/pdf/2405.14734
36
+
37
+
38
+ ## How to Get Started with the Model
39
+ ```
40
+ import torch
41
+ from transformers import pipeline
42
+
43
+ model_id = "princeton-nlp/gemma-2-9b-it-SimPO"
44
+
45
+ generator = pipeline(
46
+ "text-generation",
47
+ model=model_id,
48
+ model_kwargs={"torch_dtype": torch.bfloat16},
49
+ device="cuda",
50
+ )
51
+ outputs = generator([{"role": "user", "content": "What's the difference between llamas and alpacas?"}],
52
+ do_sample=False,
53
+ eos_token_id=[generator.tokenizer.convert_tokens_to_ids("<end_of_turn>"), generator.tokenizer.eos_token_id],
54
+ max_new_tokens=200)
55
+ print(outputs[0]['generated_text'])
56
+ ```
57
+
58
+ ## Training Details
59
+
60
+ ### Training Data
61
+
62
+ We use [princeton-nlp/gemma2-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) as the preference optimization dataset.
63
+
64
+ #### Training Hyperparameters
65
+
66
+ The hyperparameters used can be found in the [training script](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/gemma-2-9b-it-simpo.yaml).
67
+
68
+ #### Speeds, Sizes, Times
69
+
70
+ Fine-tuning the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) on [princeton-nlp/gemma2-ultrafeedback-armorm](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) takes around 100 mins to finish on 8xH100 GPUs.
71
+
72
+ ## Evaluation Results
73
+
74
+
75
+ | models | AE2 LC | AE2 WR | AE2 Length | AH | AH Length | GSM | GSM Length | MMLU | MMLU Length |
76
+ |-----------------------------------|:------:|:------:|:----------:|:----:|:---------:|:----:|:----------:|:----:|:-----------:|
77
+ | [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | 51.1 | 38.1 | 1571 | 40.8 | 545 | 87.4 | 395 | 72.7 | 515 |
78
+ | [princeton-nlp/gemma-2-9b-it-DPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-DPO) | 67.8 | 65.4 | 2016 | 58.9 | 717 | 88.5 | 392 | 72.2 | 624 |
79
+ | [princeton-nlp/gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) | 72.4 | 65.9 | 1833 | 59.1 | 693 | 88.0 | 341 | 72.2 | 441 |
80
+
81
+
82
+ ## Technical Specifications
83
+
84
+ ### Model Architecture and Objective
85
+
86
+ The model architecture is based on [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it). We use the SimPO training objective proposed in our [preprint](https://arxiv.org/pdf/2405.14734).
87
+
88
+ #### Hardware
89
+
90
+ We used 8xH100 GPUs for model training.
91
+
92
+ #### Software
93
+
94
+ Training was done using the [alignment-handbook](https://github.com/huggingface/alignment-handbook) library.
95
+
96
+ ## Citation
97
+
98
+ gemma model:
99
+ ```
100
+ @article{gemma_2024,
101
+ title={Gemma},
102
+ url={https://www.kaggle.com/m/3301},
103
+ DOI={10.34740/KAGGLE/M/3301},
104
+ publisher={Kaggle},
105
+ author={Gemma Team},
106
+ year={2024}
107
+ }
108
+ ```
109
+
110
+ SimPO paper:
111
+ ```
112
+ @article{meng2024simpo,
113
+ title={{SimPO}: Simple preference optimization with a reference-free reward},
114
+ author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
115
+ journal={arXiv preprint arXiv:2405.14734},
116
+ year={2024}
117
+ }
118
+ ```
119
+
120
+ UltraFeedback paper:
121
+ ```
122
+ @article{cui2023ultrafeedback,
123
+ title={{UltraFeedback}: Boosting language models with high-quality feedback},
124
+ author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
125
+ journal={arXiv preprint arXiv:2310.01377},
126
+ year={2023}
127
+ }
128
+ ```
129
+
130
+ ArmoRM paper:
131
+ ```
132
+ @article{wang2024interpretable,
133
+ title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
134
+ author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
135
+ journal={arXiv preprint arXiv:2406.12845},
136
+ year={2024}
137
+ }
138
+ ```
gemma-2-9b-it-simpo.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5118ef6e3d0bced3bf24b5dc3d9c3d11d8c4f10aaf22d636ed88e4a32fca6966
3
+ size 5443143040