papahawk commited on
Commit
3573cc9
1 Parent(s): 4e4b40e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +406 -0
README.md CHANGED
@@ -1,3 +1,409 @@
1
  ---
 
 
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - generated_from_trainer
4
  license: mit
5
+ datasets:
6
+ - HuggingFaceH4/ultrachat_200k
7
+ - HuggingFaceH4/ultrafeedback_binarized
8
+ language:
9
+ - en
10
+ base_model: HuggingFaceH4/zephyr-7b-beta
11
+ widget:
12
+ - text: "<|system|>\nYou are a pirate chatbot who always responds with Arr!</s>\n<|user|>\nThere's a llama on my lawn, how can I get rid of him?</s>\n<|assistant|>\n"
13
+ output:
14
+ text: "Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare sight, but I've got a plan that might help ye get rid of 'im. Ye'll need to gather some carrots and hay, and then lure the llama away with the promise of a tasty treat. Once he's gone, ye can clean up yer lawn and enjoy the peace and quiet once again. But beware, me hearty, for there may be more llamas where that one came from! Arr!"
15
+ pipeline_tag: text-generation
16
+ model-index:
17
+ - name: devi-7b-beta
18
+ results:
19
+ # AI2 Reasoning Challenge (25-Shot)
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: AI2 Reasoning Challenge (25-Shot)
25
+ type: ai2_arc
26
+ config: ARC-Challenge
27
+ split: test
28
+ args:
29
+ num_few_shot: 25
30
+ metrics:
31
+ - type: acc_norm
32
+ name: normalized accuracy
33
+ value: 62.03071672354948
34
+ source:
35
+ name: Open LLM Leaderboard
36
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
37
+
38
+ # HellaSwag (10-shot)
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: HellaSwag (10-Shot)
44
+ type: hellaswag
45
+ split: validation
46
+ args:
47
+ num_few_shot: 10
48
+ metrics:
49
+ - type: acc_norm
50
+ name: normalized accuracy
51
+ value: 84.35570603465445
52
+ source:
53
+ name: Open LLM Leaderboard
54
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
55
+
56
+ # DROP (3-shot)
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: Drop (3-Shot)
62
+ type: drop
63
+ split: validation
64
+ args:
65
+ num_few_shot: 3
66
+ metrics:
67
+ - type: f1
68
+ name: f1 score
69
+ value: 9.662437080536909
70
+ source:
71
+ name: Open LLM Leaderboard
72
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
73
+
74
+ # TruthfulQA (0-shot)
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: TruthfulQA (0-shot)
80
+ type: truthful_qa
81
+ config: multiple_choice
82
+ split: validation
83
+ args:
84
+ num_few_shot: 0
85
+ metrics:
86
+ - type: mc2
87
+ value: 57.44916942762855
88
+ source:
89
+ name: Open LLM Leaderboard
90
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
91
+
92
+ # GSM8k (5-shot)
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: GSM8k (5-shot)
98
+ type: gsm8k
99
+ config: main
100
+ split: test
101
+ args:
102
+ num_few_shot: 5
103
+ metrics:
104
+ - type: acc
105
+ name: accuracy
106
+ value: 12.736921910538287
107
+ source:
108
+ name: Open LLM Leaderboard
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
110
+
111
+ # MMLU (5-Shot)
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: MMLU (5-Shot)
117
+ type: cais/mmlu
118
+ config: all
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ name: accuracy
125
+ value: 61.07
126
+ source:
127
+ name: Open LLM Leaderboard
128
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
129
+
130
+ # Winogrande (5-shot)
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: Winogrande (5-shot)
136
+ type: winogrande
137
+ config: winogrande_xl
138
+ split: validation
139
+ args:
140
+ num_few_shot: 5
141
+ metrics:
142
+ - type: acc
143
+ name: accuracy
144
+ value: 77.74269928966061
145
+ source:
146
+ name: Open LLM Leaderboard
147
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
148
+
149
+ # AlpacaEval (taken from model card)
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: AlpacaEval
155
+ type: tatsu-lab/alpaca_eval
156
+ metrics:
157
+ - type: unknown
158
+ name: win rate
159
+ value: 0.9060
160
+ source:
161
+ url: https://tatsu-lab.github.io/alpaca_eval/
162
+
163
+ # MT-Bench (taken from model card)
164
+ - task:
165
+ type: text-generation
166
+ name: Text Generation
167
+ dataset:
168
+ name: MT-Bench
169
+ type: unknown
170
+ metrics:
171
+ - type: unknown
172
+ name: score
173
+ value: 7.34
174
+ source:
175
+ url: https://huggingface.co/spaces/lmsys/mt-bench
176
  ---
177
+
178
+ <h1 style='text-align: center '>✨Devi AI 7B✨</h1>
179
+ <h1 style='text-align: center '><em>fork of zephyr-7b β </em> </h1>
180
+ <h2 style='text-align: center '>All credit and thanks to HuggingFaceH4 for their work!</h2>
181
+ <img src="https://alt-web.xyz/images/rainbow.png" alt="Rainbow Solutions" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
182
+
183
+ # Model Card for Devi AI 7B
184
+
185
+ Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained on on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). We found that removing the in-built alignment of these datasets boosted performance on [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so. You can find more details in the [technical report](https://arxiv.org/abs/2310.16944).
186
+
187
+
188
+ ## Model description
189
+
190
+ - **Model type:** A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
191
+ - **Language(s) (NLP):** Primarily English
192
+ - **License:** MIT
193
+ - **Finetuned from model:** [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
194
+
195
+ ### Model Sources
196
+
197
+ <!-- Provide the basic links for the model. -->
198
+
199
+ - **Repository:** https://github.com/huggingface/alignment-handbook
200
+ - **Demo:** https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat
201
+ - **Chatbot Arena:** Evaluate Zephyr 7B against 10+ LLMs in the LMSYS arena: http://arena.lmsys.org
202
+
203
+ ## Performance
204
+
205
+ At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the [MT-Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmarks:
206
+
207
+ | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
208
+ |-------------|-----|----|---------------|--------------|
209
+ | StableLM-Tuned-α | 7B| dSFT |2.75| -|
210
+ | MPT-Chat | 7B |dSFT |5.42| -|
211
+ | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
212
+ | Mistral-Instructv0.1 | 7B| - | 6.84 |-|
213
+ | Zephyr-7b-α |7B| dDPO| 6.88| -|
214
+ | **Zephyr-7b-β** 🪁 | **7B** | **dDPO** | **7.34** | **90.60** |
215
+ | Falcon-Instruct | 40B |dSFT |5.17 |45.71|
216
+ | Guanaco | 65B | SFT |6.41| 71.80|
217
+ | Llama2-Chat | 70B |RLHF |6.86| 92.66|
218
+ | Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
219
+ | WizardLM v1.0 | 70B |dSFT |7.71 |-|
220
+ | Xwin-LM v0.1 | 70B |dPPO |- |95.57|
221
+ | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
222
+ | Claude 2 | - |RLHF |8.06| 91.36|
223
+ | GPT-4 | -| RLHF |8.99| 95.28|
224
+
225
+ In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:
226
+
227
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6200d0a443eb0913fa2df7cc/raxvt5ma16d7T23my34WC.png)
228
+
229
+ However, on more complex tasks like coding and mathematics, Zephyr-7B-β lags behind proprietary models and more research is needed to close the gap.
230
+
231
+
232
+ ## Intended uses & limitations
233
+
234
+ The model was initially fine-tuned on a filtered and preprocessed of the [`UltraChat`](https://huggingface.co/datasets/stingning/ultrachat) dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT.
235
+ We then further aligned the model with [🤗 TRL's](https://github.com/huggingface/trl) `DPOTrainer` on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4. As a result, the model can be used for chat and you can check out our [demo](https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat) to test its capabilities.
236
+
237
+ You can find the datasets used for training Zephyr-7B-β [here](https://huggingface.co/collections/HuggingFaceH4/zephyr-7b-6538c6d6d5ddd1cbb1744a66)
238
+
239
+ Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
240
+
241
+ ```python
242
+ # Install transformers from source - only needed for versions <= v4.34
243
+ # pip install git+https://github.com/huggingface/transformers.git
244
+ # pip install accelerate
245
+
246
+ import torch
247
+ from transformers import pipeline
248
+
249
+ pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
250
+
251
+ # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
252
+ messages = [
253
+ {
254
+ "role": "system",
255
+ "content": "You are a friendly chatbot who always responds in the style of a pirate",
256
+ },
257
+ {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
258
+ ]
259
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
260
+ outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
261
+ print(outputs[0]["generated_text"])
262
+ # <|system|>
263
+ # You are a friendly chatbot who always responds in the style of a pirate.</s>
264
+ # <|user|>
265
+ # How many helicopters can a human eat in one sitting?</s>
266
+ # <|assistant|>
267
+ # Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!
268
+ ```
269
+
270
+ ## Bias, Risks, and Limitations
271
+
272
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
273
+
274
+ Zephyr-7B-β has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
275
+ It is also unknown what the size and composition of the corpus was used to train the base model (`mistralai/Mistral-7B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
276
+
277
+
278
+ ## Training and evaluation data
279
+
280
+ During DPO training, this model achieves the following results on the evaluation set:
281
+
282
+ - Loss: 0.7496
283
+ - Rewards/chosen: -4.5221
284
+ - Rewards/rejected: -8.3184
285
+ - Rewards/accuracies: 0.7812
286
+ - Rewards/margins: 3.7963
287
+ - Logps/rejected: -340.1541
288
+ - Logps/chosen: -299.4561
289
+ - Logits/rejected: -2.3081
290
+ - Logits/chosen: -2.3531
291
+
292
+
293
+ ### Training hyperparameters
294
+
295
+ The following hyperparameters were used during training:
296
+ - learning_rate: 5e-07
297
+ - train_batch_size: 2
298
+ - eval_batch_size: 4
299
+ - seed: 42
300
+ - distributed_type: multi-GPU
301
+ - num_devices: 16
302
+ - total_train_batch_size: 32
303
+ - total_eval_batch_size: 64
304
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
305
+ - lr_scheduler_type: linear
306
+ - lr_scheduler_warmup_ratio: 0.1
307
+ - num_epochs: 3.0
308
+
309
+ ### Training results
310
+
311
+ The table below shows the full set of DPO training metrics:
312
+
313
+
314
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
315
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
316
+ | 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
317
+ | 0.4908 | 0.1 | 200 | 0.5426 | -0.0279 | -0.6842 | 0.75 | 0.6563 | -263.8124 | -254.5145 | -2.7719 | -2.7960 |
318
+ | 0.5264 | 0.15 | 300 | 0.5324 | 0.0414 | -0.9793 | 0.7656 | 1.0207 | -266.7627 | -253.8209 | -2.7892 | -2.8122 |
319
+ | 0.5536 | 0.21 | 400 | 0.4957 | -0.0185 | -1.5276 | 0.7969 | 1.5091 | -272.2460 | -254.4203 | -2.8542 | -2.8764 |
320
+ | 0.5362 | 0.26 | 500 | 0.5031 | -0.2630 | -1.5917 | 0.7812 | 1.3287 | -272.8869 | -256.8653 | -2.8702 | -2.8958 |
321
+ | 0.5966 | 0.31 | 600 | 0.5963 | -0.2993 | -1.6491 | 0.7812 | 1.3499 | -273.4614 | -257.2279 | -2.8778 | -2.8986 |
322
+ | 0.5014 | 0.36 | 700 | 0.5382 | -0.2859 | -1.4750 | 0.75 | 1.1891 | -271.7204 | -257.0942 | -2.7659 | -2.7869 |
323
+ | 0.5334 | 0.41 | 800 | 0.5677 | -0.4289 | -1.8968 | 0.7969 | 1.4679 | -275.9378 | -258.5242 | -2.7053 | -2.7265 |
324
+ | 0.5251 | 0.46 | 900 | 0.5772 | -0.2116 | -1.3107 | 0.7344 | 1.0991 | -270.0768 | -256.3507 | -2.8463 | -2.8662 |
325
+ | 0.5205 | 0.52 | 1000 | 0.5262 | -0.3792 | -1.8585 | 0.7188 | 1.4793 | -275.5552 | -258.0276 | -2.7893 | -2.7979 |
326
+ | 0.5094 | 0.57 | 1100 | 0.5433 | -0.6279 | -1.9368 | 0.7969 | 1.3089 | -276.3377 | -260.5136 | -2.7453 | -2.7536 |
327
+ | 0.5837 | 0.62 | 1200 | 0.5349 | -0.3780 | -1.9584 | 0.7656 | 1.5804 | -276.5542 | -258.0154 | -2.7643 | -2.7756 |
328
+ | 0.5214 | 0.67 | 1300 | 0.5732 | -1.0055 | -2.2306 | 0.7656 | 1.2251 | -279.2761 | -264.2903 | -2.6986 | -2.7113 |
329
+ | 0.6914 | 0.72 | 1400 | 0.5137 | -0.6912 | -2.1775 | 0.7969 | 1.4863 | -278.7448 | -261.1467 | -2.7166 | -2.7275 |
330
+ | 0.4655 | 0.77 | 1500 | 0.5090 | -0.7987 | -2.2930 | 0.7031 | 1.4943 | -279.8999 | -262.2220 | -2.6651 | -2.6838 |
331
+ | 0.5731 | 0.83 | 1600 | 0.5312 | -0.8253 | -2.3520 | 0.7812 | 1.5268 | -280.4902 | -262.4876 | -2.6543 | -2.6728 |
332
+ | 0.5233 | 0.88 | 1700 | 0.5206 | -0.4573 | -2.0951 | 0.7812 | 1.6377 | -277.9205 | -258.8084 | -2.6870 | -2.7097 |
333
+ | 0.5593 | 0.93 | 1800 | 0.5231 | -0.5508 | -2.2000 | 0.7969 | 1.6492 | -278.9703 | -259.7433 | -2.6221 | -2.6519 |
334
+ | 0.4967 | 0.98 | 1900 | 0.5290 | -0.5340 | -1.9570 | 0.8281 | 1.4230 | -276.5395 | -259.5749 | -2.6564 | -2.6878 |
335
+ | 0.0921 | 1.03 | 2000 | 0.5368 | -1.1376 | -3.1615 | 0.7812 | 2.0239 | -288.5854 | -265.6111 | -2.6040 | -2.6345 |
336
+ | 0.0733 | 1.08 | 2100 | 0.5453 | -1.1045 | -3.4451 | 0.7656 | 2.3406 | -291.4208 | -265.2799 | -2.6289 | -2.6595 |
337
+ | 0.0972 | 1.14 | 2200 | 0.5571 | -1.6915 | -3.9823 | 0.8125 | 2.2908 | -296.7934 | -271.1505 | -2.6471 | -2.6709 |
338
+ | 0.1058 | 1.19 | 2300 | 0.5789 | -1.0621 | -3.8941 | 0.7969 | 2.8319 | -295.9106 | -264.8563 | -2.5527 | -2.5798 |
339
+ | 0.2423 | 1.24 | 2400 | 0.5455 | -1.1963 | -3.5590 | 0.7812 | 2.3627 | -292.5599 | -266.1981 | -2.5414 | -2.5784 |
340
+ | 0.1177 | 1.29 | 2500 | 0.5889 | -1.8141 | -4.3942 | 0.7969 | 2.5801 | -300.9120 | -272.3761 | -2.4802 | -2.5189 |
341
+ | 0.1213 | 1.34 | 2600 | 0.5683 | -1.4608 | -3.8420 | 0.8125 | 2.3812 | -295.3901 | -268.8436 | -2.4774 | -2.5207 |
342
+ | 0.0889 | 1.39 | 2700 | 0.5890 | -1.6007 | -3.7337 | 0.7812 | 2.1330 | -294.3068 | -270.2423 | -2.4123 | -2.4522 |
343
+ | 0.0995 | 1.45 | 2800 | 0.6073 | -1.5519 | -3.8362 | 0.8281 | 2.2843 | -295.3315 | -269.7538 | -2.4685 | -2.5050 |
344
+ | 0.1145 | 1.5 | 2900 | 0.5790 | -1.7939 | -4.2876 | 0.8438 | 2.4937 | -299.8461 | -272.1744 | -2.4272 | -2.4674 |
345
+ | 0.0644 | 1.55 | 3000 | 0.5735 | -1.7285 | -4.2051 | 0.8125 | 2.4766 | -299.0209 | -271.5201 | -2.4193 | -2.4574 |
346
+ | 0.0798 | 1.6 | 3100 | 0.5537 | -1.7226 | -4.2850 | 0.8438 | 2.5624 | -299.8200 | -271.4610 | -2.5367 | -2.5696 |
347
+ | 0.1013 | 1.65 | 3200 | 0.5575 | -1.5715 | -3.9813 | 0.875 | 2.4098 | -296.7825 | -269.9498 | -2.4926 | -2.5267 |
348
+ | 0.1254 | 1.7 | 3300 | 0.5905 | -1.6412 | -4.4703 | 0.8594 | 2.8291 | -301.6730 | -270.6473 | -2.5017 | -2.5340 |
349
+ | 0.085 | 1.76 | 3400 | 0.6133 | -1.9159 | -4.6760 | 0.8438 | 2.7601 | -303.7296 | -273.3941 | -2.4614 | -2.4960 |
350
+ | 0.065 | 1.81 | 3500 | 0.6074 | -1.8237 | -4.3525 | 0.8594 | 2.5288 | -300.4951 | -272.4724 | -2.4597 | -2.5004 |
351
+ | 0.0755 | 1.86 | 3600 | 0.5836 | -1.9252 | -4.4005 | 0.8125 | 2.4753 | -300.9748 | -273.4872 | -2.4327 | -2.4716 |
352
+ | 0.0746 | 1.91 | 3700 | 0.5789 | -1.9280 | -4.4906 | 0.8125 | 2.5626 | -301.8762 | -273.5149 | -2.4686 | -2.5115 |
353
+ | 0.1348 | 1.96 | 3800 | 0.6015 | -1.8658 | -4.2428 | 0.8281 | 2.3769 | -299.3976 | -272.8936 | -2.4943 | -2.5393 |
354
+ | 0.0217 | 2.01 | 3900 | 0.6122 | -2.3335 | -4.9229 | 0.8281 | 2.5894 | -306.1988 | -277.5699 | -2.4841 | -2.5272 |
355
+ | 0.0219 | 2.07 | 4000 | 0.6522 | -2.9890 | -6.0164 | 0.8281 | 3.0274 | -317.1334 | -284.1248 | -2.4105 | -2.4545 |
356
+ | 0.0119 | 2.12 | 4100 | 0.6922 | -3.4777 | -6.6749 | 0.7969 | 3.1972 | -323.7187 | -289.0121 | -2.4272 | -2.4699 |
357
+ | 0.0153 | 2.17 | 4200 | 0.6993 | -3.2406 | -6.6775 | 0.7969 | 3.4369 | -323.7453 | -286.6413 | -2.4047 | -2.4465 |
358
+ | 0.011 | 2.22 | 4300 | 0.7178 | -3.7991 | -7.4397 | 0.7656 | 3.6406 | -331.3667 | -292.2260 | -2.3843 | -2.4290 |
359
+ | 0.0072 | 2.27 | 4400 | 0.6840 | -3.3269 | -6.8021 | 0.8125 | 3.4752 | -324.9908 | -287.5042 | -2.4095 | -2.4536 |
360
+ | 0.0197 | 2.32 | 4500 | 0.7013 | -3.6890 | -7.3014 | 0.8125 | 3.6124 | -329.9841 | -291.1250 | -2.4118 | -2.4543 |
361
+ | 0.0182 | 2.37 | 4600 | 0.7476 | -3.8994 | -7.5366 | 0.8281 | 3.6372 | -332.3356 | -293.2291 | -2.4163 | -2.4565 |
362
+ | 0.0125 | 2.43 | 4700 | 0.7199 | -4.0560 | -7.5765 | 0.8438 | 3.5204 | -332.7345 | -294.7952 | -2.3699 | -2.4100 |
363
+ | 0.0082 | 2.48 | 4800 | 0.7048 | -3.6613 | -7.1356 | 0.875 | 3.4743 | -328.3255 | -290.8477 | -2.3925 | -2.4303 |
364
+ | 0.0118 | 2.53 | 4900 | 0.6976 | -3.7908 | -7.3152 | 0.8125 | 3.5244 | -330.1224 | -292.1431 | -2.3633 | -2.4047 |
365
+ | 0.0118 | 2.58 | 5000 | 0.7198 | -3.9049 | -7.5557 | 0.8281 | 3.6508 | -332.5271 | -293.2844 | -2.3764 | -2.4194 |
366
+ | 0.006 | 2.63 | 5100 | 0.7506 | -4.2118 | -7.9149 | 0.8125 | 3.7032 | -336.1194 | -296.3530 | -2.3407 | -2.3860 |
367
+ | 0.0143 | 2.68 | 5200 | 0.7408 | -4.2433 | -7.9802 | 0.8125 | 3.7369 | -336.7721 | -296.6682 | -2.3509 | -2.3946 |
368
+ | 0.0057 | 2.74 | 5300 | 0.7552 | -4.3392 | -8.0831 | 0.7969 | 3.7439 | -337.8013 | -297.6275 | -2.3388 | -2.3842 |
369
+ | 0.0138 | 2.79 | 5400 | 0.7404 | -4.2395 | -7.9762 | 0.8125 | 3.7367 | -336.7322 | -296.6304 | -2.3286 | -2.3737 |
370
+ | 0.0079 | 2.84 | 5500 | 0.7525 | -4.4466 | -8.2196 | 0.7812 | 3.7731 | -339.1662 | -298.7007 | -2.3200 | -2.3641 |
371
+ | 0.0077 | 2.89 | 5600 | 0.7520 | -4.5586 | -8.3485 | 0.7969 | 3.7899 | -340.4545 | -299.8206 | -2.3078 | -2.3517 |
372
+ | 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
373
+ | 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
374
+
375
+
376
+ ### Framework versions
377
+
378
+ - Transformers 4.35.0.dev0
379
+ - Pytorch 2.0.1+cu118
380
+ - Datasets 2.12.0
381
+ - Tokenizers 0.14.0
382
+
383
+ ## Citation
384
+
385
+ If you find Zephyr-7B-β is useful in your work, please cite it with:
386
+
387
+ ```
388
+ @misc{tunstall2023zephyr,
389
+ title={Zephyr: Direct Distillation of LM Alignment},
390
+ author={Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Kashif Rasul and Younes Belkada and Shengyi Huang and Leandro von Werra and Clémentine Fourrier and Nathan Habib and Nathan Sarrazin and Omar Sanseviero and Alexander M. Rush and Thomas Wolf},
391
+ year={2023},
392
+ eprint={2310.16944},
393
+ archivePrefix={arXiv},
394
+ primaryClass={cs.LG}
395
+ }
396
+ ```
397
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
398
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
399
+
400
+ | Metric | Value |
401
+ |-----------------------|---------------------------|
402
+ | Avg. | 52.15 |
403
+ | ARC (25-shot) | 62.03 |
404
+ | HellaSwag (10-shot) | 84.36 |
405
+ | MMLU (5-shot) | 61.07 |
406
+ | TruthfulQA (0-shot) | 57.45 |
407
+ | Winogrande (5-shot) | 77.74 |
408
+ | GSM8K (5-shot) | 12.74 |
409
+ | DROP (3-shot) | 9.66 |