Update README.md
Browse files
README.md
CHANGED
@@ -20,13 +20,12 @@ tags:
|
|
20 |
|
21 |
# Model Card for Phoenix
|
22 |
|
23 |
-
|
24 |
**Phoenix** is a model trained using Direct Preference Optimization (DPO) for the german language. Its training procedure follows the process of the alignment-handbook from Huggingface.
|
25 |
In contrast to zephyr and notus this model has been trained using german instruction and dpo data. In detail, a german translation of HuggingFaceH4/ultrachat_200k
|
26 |
and HuggingFaceH4/ultrafeedback_binarized were created in addition to a series of allready available instruction datasets. The LLM haoranxu/ALMA-13B was used for this.
|
27 |
While the mistral model performs really well, it is not really suitable for the german language. Therefore we have used the fantastic LeoLM/leo-mistral-hessianai-7b.
|
28 |
Thanks to the new type of training, Phoenix is not only able to compete with the Mistral model from LeoLM but also **beats the Llama-70b-chat model in 2 mt-bench categories**.
|
29 |
-
This model **wouldn't have been possible without the amazing work of Huggingface, LeoLM, openbnb,
|
30 |
i would like to personally thank all AI researchers who make the training of such models possible
|
31 |
|
32 |
## MT-Bench-DE Scores
|
@@ -72,7 +71,7 @@ Florian Leurer compared Phoenix to other LLMs. Check it out here:
|
|
72 |
### Model Sources
|
73 |
|
74 |
- **Repository:** -
|
75 |
-
- **Paper:**
|
76 |
- **Demo:** -
|
77 |
|
78 |
## Training Details
|
@@ -116,8 +115,8 @@ You will first need to install `transformers` and `accelerate` (just to ease the
|
|
116 |
```python
|
117 |
import torch
|
118 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
119 |
-
model = AutoModelForCausalLM.from_pretrained("DRXD1000/Phoenix
|
120 |
-
tokenizer = AutoTokenizer.from_pretrained("DRXD1000/Phoenix
|
121 |
prompt = """<|system|>
|
122 |
</s>
|
123 |
<|user|>
|
@@ -131,9 +130,9 @@ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
131 |
|
132 |
## Ethical Considerations and Limitations
|
133 |
|
134 |
-
As with all LLMs, the potential outputs of `DRXD1000/Phoenix
|
135 |
in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
|
136 |
-
to user prompts. Therefore, before deploying any applications of `DRXD1000/Phoenix
|
137 |
perform safety testing and tuning tailored to their specific applications of the model.
|
138 |
Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/).
|
139 |
|
@@ -144,6 +143,22 @@ Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-
|
|
144 |
### Training hyperparameters
|
145 |
|
146 |
The following hyperparameters were used during training:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
- learning_rate: 5e-07
|
148 |
- train_batch_size: 8
|
149 |
- eval_batch_size: 4
|
@@ -157,6 +172,18 @@ The following hyperparameters were used during training:
|
|
157 |
- lr_scheduler_warmup_ratio: 0.1
|
158 |
- num_epochs: 1
|
159 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
160 |
|
161 |
### Framework versions
|
162 |
|
|
|
20 |
|
21 |
# Model Card for Phoenix
|
22 |
|
|
|
23 |
**Phoenix** is a model trained using Direct Preference Optimization (DPO) for the german language. Its training procedure follows the process of the alignment-handbook from Huggingface.
|
24 |
In contrast to zephyr and notus this model has been trained using german instruction and dpo data. In detail, a german translation of HuggingFaceH4/ultrachat_200k
|
25 |
and HuggingFaceH4/ultrafeedback_binarized were created in addition to a series of allready available instruction datasets. The LLM haoranxu/ALMA-13B was used for this.
|
26 |
While the mistral model performs really well, it is not really suitable for the german language. Therefore we have used the fantastic LeoLM/leo-mistral-hessianai-7b.
|
27 |
Thanks to the new type of training, Phoenix is not only able to compete with the Mistral model from LeoLM but also **beats the Llama-70b-chat model in 2 mt-bench categories**.
|
28 |
+
This model **wouldn't have been possible without the amazing work of Huggingface, LeoLM, openbnb, argilla, the Alma-Team and many others of the AI community**.
|
29 |
i would like to personally thank all AI researchers who make the training of such models possible
|
30 |
|
31 |
## MT-Bench-DE Scores
|
|
|
71 |
### Model Sources
|
72 |
|
73 |
- **Repository:** -
|
74 |
+
- **Paper:** [`PHOENIX: Open-Source Language Adaption for Direct Preference Optimization`](https://arxiv.org/abs/2401.10580)
|
75 |
- **Demo:** -
|
76 |
|
77 |
## Training Details
|
|
|
115 |
```python
|
116 |
import torch
|
117 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
118 |
+
model = AutoModelForCausalLM.from_pretrained("DRXD1000/Phoenix", torch_dtype=torch.bfloat16, device_map="auto")
|
119 |
+
tokenizer = AutoTokenizer.from_pretrained("DRXD1000/Phoenix")
|
120 |
prompt = """<|system|>
|
121 |
</s>
|
122 |
<|user|>
|
|
|
130 |
|
131 |
## Ethical Considerations and Limitations
|
132 |
|
133 |
+
As with all LLMs, the potential outputs of `DRXD1000/Phoenix` cannot be predicted
|
134 |
in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses
|
135 |
+
to user prompts. Therefore, before deploying any applications of `DRXD1000/Phoenix`, developers should
|
136 |
perform safety testing and tuning tailored to their specific applications of the model.
|
137 |
Please see Meta's [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/).
|
138 |
|
|
|
143 |
### Training hyperparameters
|
144 |
|
145 |
The following hyperparameters were used during training:
|
146 |
+
|
147 |
+
#### SFT Training
|
148 |
+
- learning_rate: 2e-05
|
149 |
+
- train_batch_size: 32
|
150 |
+
- eval_batch_size: 16
|
151 |
+
- seed: 42
|
152 |
+
- distributed_type: multi-GPU
|
153 |
+
- num_devices: 8
|
154 |
+
- gradient_accumulation_steps: 2
|
155 |
+
- total_train_batch_size: 512
|
156 |
+
- total_eval_batch_size: 128
|
157 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
158 |
+
- lr_scheduler_type: cosine
|
159 |
+
- num_epochs: 1
|
160 |
+
|
161 |
+
#### DPO Training
|
162 |
- learning_rate: 5e-07
|
163 |
- train_batch_size: 8
|
164 |
- eval_batch_size: 4
|
|
|
172 |
- lr_scheduler_warmup_ratio: 0.1
|
173 |
- num_epochs: 1
|
174 |
|
175 |
+
### Citation
|
176 |
+
```
|
177 |
+
@misc{uhlig2024phoenix,
|
178 |
+
title={PHOENIX: Open-Source Language Adaption for Direct Preference Optimization},
|
179 |
+
author={Matthias Uhlig and Sigurd Schacht and Sudarshan Kamath Barkur},
|
180 |
+
year={2024},
|
181 |
+
eprint={2401.10580},
|
182 |
+
archivePrefix={arXiv},
|
183 |
+
primaryClass={cs.CL}
|
184 |
+
}
|
185 |
+
```
|
186 |
+
|
187 |
|
188 |
### Framework versions
|
189 |
|