GGUF
RichardErkhov commited on
Commit
e176875
1 Parent(s): c6a3ff9

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +522 -0
README.md ADDED
@@ -0,0 +1,522 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ juanako-7b-UNA - GGUF
11
+ - Model creator: https://huggingface.co/fblgit/
12
+ - Original model: https://huggingface.co/fblgit/juanako-7b-UNA/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [juanako-7b-UNA.Q2_K.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q2_K.gguf) | Q2_K | 2.53GB |
18
+ | [juanako-7b-UNA.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.IQ3_XS.gguf) | IQ3_XS | 2.81GB |
19
+ | [juanako-7b-UNA.IQ3_S.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.IQ3_S.gguf) | IQ3_S | 2.96GB |
20
+ | [juanako-7b-UNA.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q3_K_S.gguf) | Q3_K_S | 2.95GB |
21
+ | [juanako-7b-UNA.IQ3_M.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.IQ3_M.gguf) | IQ3_M | 3.06GB |
22
+ | [juanako-7b-UNA.Q3_K.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q3_K.gguf) | Q3_K | 3.28GB |
23
+ | [juanako-7b-UNA.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q3_K_M.gguf) | Q3_K_M | 3.28GB |
24
+ | [juanako-7b-UNA.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q3_K_L.gguf) | Q3_K_L | 3.56GB |
25
+ | [juanako-7b-UNA.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.IQ4_XS.gguf) | IQ4_XS | 3.67GB |
26
+ | [juanako-7b-UNA.Q4_0.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q4_0.gguf) | Q4_0 | 3.83GB |
27
+ | [juanako-7b-UNA.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.IQ4_NL.gguf) | IQ4_NL | 0.92GB |
28
+ | [juanako-7b-UNA.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q4_K_S.gguf) | Q4_K_S | 0.07GB |
29
+ | [juanako-7b-UNA.Q4_K.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q4_K.gguf) | Q4_K | 0.0GB |
30
+ | [juanako-7b-UNA.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q4_K_M.gguf) | Q4_K_M | 0.0GB |
31
+ | [juanako-7b-UNA.Q4_1.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q4_1.gguf) | Q4_1 | 0.0GB |
32
+ | [juanako-7b-UNA.Q5_0.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q5_0.gguf) | Q5_0 | 0.0GB |
33
+ | [juanako-7b-UNA.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q5_K_S.gguf) | Q5_K_S | 0.0GB |
34
+ | [juanako-7b-UNA.Q5_K.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q5_K.gguf) | Q5_K | 0.0GB |
35
+ | [juanako-7b-UNA.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q5_K_M.gguf) | Q5_K_M | 0.0GB |
36
+ | [juanako-7b-UNA.Q5_1.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q5_1.gguf) | Q5_1 | 0.0GB |
37
+ | [juanako-7b-UNA.Q6_K.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q6_K.gguf) | Q6_K | 0.0GB |
38
+ | [juanako-7b-UNA.Q8_0.gguf](https://huggingface.co/RichardErkhov/fblgit_-_juanako-7b-UNA-gguf/blob/main/juanako-7b-UNA.Q8_0.gguf) | Q8_0 | 0.0GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ license: apache-2.0
46
+ tags:
47
+ - alignment-handbook
48
+ - generated_from_trainer
49
+ - juanako
50
+ - mistral
51
+ - UNA
52
+ datasets:
53
+ - HuggingFaceH4/ultrafeedback_binarized
54
+ model-index:
55
+ - name: juanako-7b-UNA
56
+ results:
57
+ - task:
58
+ type: text-generation
59
+ name: TruthfulQA (MC2)
60
+ dataset:
61
+ name: truthful_qa
62
+ type: text-generation
63
+ config: multiple_choice
64
+ split: validation
65
+ metrics:
66
+ - type: accuracy
67
+ value: 65.13
68
+ verified: true
69
+ - task:
70
+ type: text-generation
71
+ name: ARC-Challenge
72
+ dataset:
73
+ name: ai2_arc
74
+ type: text-generation
75
+ config: ARC-Challenge
76
+ split: test
77
+ metrics:
78
+ - type: accuracy
79
+ value: 68.17
80
+ verified: true
81
+ - task:
82
+ type: text-generation
83
+ name: HellaSwag
84
+ dataset:
85
+ name: Rowan/hellaswag
86
+ type: text-generation
87
+ split: test
88
+ metrics:
89
+ - type: accuracy
90
+ value: 85.34
91
+ verified: true
92
+ - type: accuracy
93
+ value: 83.57
94
+ - task:
95
+ type: text-generation
96
+ name: Winogrande
97
+ dataset:
98
+ name: winogrande
99
+ type: text-generation
100
+ config: winogrande_debiased
101
+ split: test
102
+ metrics:
103
+ - type: accuracy
104
+ value: 78.85
105
+ verified: true
106
+ - task:
107
+ type: text-generation
108
+ name: MMLU
109
+ dataset:
110
+ name: cais/mmlu
111
+ type: text-generation
112
+ config: all
113
+ split: test
114
+ metrics:
115
+ - type: accuracy
116
+ value: 62.47
117
+ verified: true
118
+ - task:
119
+ type: text-generation
120
+ name: DROP
121
+ dataset:
122
+ name: drop
123
+ type: text-generation
124
+ split: validation
125
+ metrics:
126
+ - type: accuracy
127
+ value: 38.74
128
+ verified: true
129
+ - task:
130
+ type: text-generation
131
+ name: PubMedQA
132
+ dataset:
133
+ name: bigbio/pubmed_qa
134
+ type: text-generation
135
+ config: pubmed_qa_artificial_bigbio_qa
136
+ split: validation
137
+ metrics:
138
+ - type: accuracy
139
+ value: 76.0
140
+ - task:
141
+ type: text-generation
142
+ name: Text Generation
143
+ dataset:
144
+ name: AI2 Reasoning Challenge (25-Shot)
145
+ type: ai2_arc
146
+ config: ARC-Challenge
147
+ split: test
148
+ args:
149
+ num_few_shot: 25
150
+ metrics:
151
+ - type: acc_norm
152
+ value: 68.17
153
+ name: normalized accuracy
154
+ source:
155
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
156
+ name: Open LLM Leaderboard
157
+ - task:
158
+ type: text-generation
159
+ name: Text Generation
160
+ dataset:
161
+ name: HellaSwag (10-Shot)
162
+ type: hellaswag
163
+ split: validation
164
+ args:
165
+ num_few_shot: 10
166
+ metrics:
167
+ - type: acc_norm
168
+ value: 85.34
169
+ name: normalized accuracy
170
+ source:
171
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
172
+ name: Open LLM Leaderboard
173
+ - task:
174
+ type: text-generation
175
+ name: Text Generation
176
+ dataset:
177
+ name: MMLU (5-Shot)
178
+ type: cais/mmlu
179
+ config: all
180
+ split: test
181
+ args:
182
+ num_few_shot: 5
183
+ metrics:
184
+ - type: acc
185
+ value: 62.47
186
+ name: accuracy
187
+ source:
188
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
189
+ name: Open LLM Leaderboard
190
+ - task:
191
+ type: text-generation
192
+ name: Text Generation
193
+ dataset:
194
+ name: TruthfulQA (0-shot)
195
+ type: truthful_qa
196
+ config: multiple_choice
197
+ split: validation
198
+ args:
199
+ num_few_shot: 0
200
+ metrics:
201
+ - type: mc2
202
+ value: 65.13
203
+ source:
204
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
205
+ name: Open LLM Leaderboard
206
+ - task:
207
+ type: text-generation
208
+ name: Text Generation
209
+ dataset:
210
+ name: Winogrande (5-shot)
211
+ type: winogrande
212
+ config: winogrande_xl
213
+ split: validation
214
+ args:
215
+ num_few_shot: 5
216
+ metrics:
217
+ - type: acc
218
+ value: 78.85
219
+ name: accuracy
220
+ source:
221
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
222
+ name: Open LLM Leaderboard
223
+ - task:
224
+ type: text-generation
225
+ name: Text Generation
226
+ dataset:
227
+ name: GSM8k (5-shot)
228
+ type: gsm8k
229
+ config: main
230
+ split: test
231
+ args:
232
+ num_few_shot: 5
233
+ metrics:
234
+ - type: acc
235
+ value: 44.81
236
+ name: accuracy
237
+ source:
238
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/juanako-7b-UNA
239
+ name: Open LLM Leaderboard
240
+ ---
241
+
242
+ # juanako-7b-UNA (Uniform Neural Alignment)
243
+
244
+ This model is a fine-tuned version of [fblgit/juanako-7b-UNA-v2-phase-1](https://huggingface.co/fblgit/juanako-7b-UNA-v2-phase-1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
245
+ It outperforms in many aspects most of the current Mistral based models and is the **latest and most powerful juanako version as of now**.
246
+
247
+ ## Scores
248
+
249
+ The official HuggingFace results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/fblgit/juanako-7b-UNA/results_2023-11-28T08-33-33.965228.json)
250
+
251
+ | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) |
252
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
253
+ |[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 50.32 | 59.58 | 83.31 | 64.16 | 42.15 | 78.37 | 18.12 | 6.14 |
254
+ | [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1) | 59.0 | 66.21 | 83.64 | 62.37 | 59.65 | 78.14 | 19.56 | 43.84 |
255
+ | [fblgit/juanako-7b-UNA](https://huggingface.co/fblgit/juanako-7b-UNA) | **59.91** | **68.17** | **85.34** | 62.47 | **65.13** | **78.85** | **20.7** | 38.74 |
256
+
257
+ It scores: **59.91** according HuggingFace LLM Leaderboard.
258
+ It scores: **65.1** with `big-refactor` branch of lm-eval-harness
259
+
260
+ Author [Xavier M.](mailto:[email protected]) @fblgit
261
+
262
+ ## Model description
263
+
264
+ juanako uses UNA, Uniform Neural Alignment. A training technique that ease alignment between transformer layers yet to be published.
265
+
266
+ ### Prompts
267
+ The following prompts showed positive results, it may depend the task and needs further experimentation but this should work for starters:
268
+ ```
269
+ <|im_start|>system
270
+ - You are a helpful assistant chatbot trained by MosaicML.
271
+ - You answer questions.
272
+ - You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
273
+ - You are more than just an information source, you are also able to write poetry, short stories, and make jokes.<|im_end|>
274
+ <|im_start|>user
275
+ Explain QKV<|im_end|>
276
+ <|im_start|>assistant
277
+ ```
278
+ ```
279
+ ### Assistant: I am StableVicuna, a large language model created by CarperAI. I am here to chat!
280
+
281
+ ### Human: Explain QKV
282
+ ### Assistant:
283
+ ```
284
+ ```
285
+ [Round <|round|>]
286
+ 问:Explain QKV
287
+ 答:
288
+ ```
289
+ ```
290
+ [Round <|round|>]
291
+ Question:Explain QKV
292
+ Answer:
293
+ ```
294
+ ```
295
+ Question:Explain QKV
296
+ Answer:
297
+ ```
298
+
299
+ ## Evaluations (lm-eval big-refactor branch)
300
+
301
+ ### TruthfulQA 0-Shot
302
+ ```
303
+ | Tasks |Version|Filter|Metric|Value | |Stderr|
304
+ |--------------|-------|------|------|-----:|---|-----:|
305
+ |truthfulqa_mc2|Yaml |none |acc |0.6549|± |0.0153|
306
+ ```
307
+ ### ARC 25-Shot
308
+ ```
309
+ | Tasks |Version|Filter| Metric |Value | |Stderr|
310
+ |-------------|-------|------|--------|-----:|---|-----:|
311
+ |arc_challenge|Yaml |none |acc |0.6476|± |0.0140|
312
+ | | |none |acc_norm|0.6809|± |0.0136|
313
+ ```
314
+ ### HellaSwag 10-Shot
315
+ ```
316
+ | Tasks |Version|Filter| Metric |Value | |Stderr|
317
+ |---------|-------|------|--------|-----:|---|-----:|
318
+ |hellaswag|Yaml |none |acc |0.6703|± |0.0047|
319
+ | | |none |acc_norm|0.8520|± |0.0035|
320
+ ```
321
+ ### GSM8k 5-Shot
322
+ ```
323
+ |Tasks|Version| Filter | Metric |Value | |Stderr|
324
+ |-----|-------|----------|-----------|-----:|---|-----:|
325
+ |gsm8k|Yaml |get-answer|exact_match|0.4898|± |0.0138|
326
+ ```
327
+ ### GPT Evaluations 0-Shot
328
+ ```
329
+ | Tasks |Version|Filter| Metric |Value | |Stderr|
330
+ |--------------|-------|------|----------|-----:|---|-----:|
331
+ |boolq |Yaml |none |acc |0.8703|± |0.0059|
332
+ |lambada_openai|Yaml |none |perplexity|3.2598|± |0.0705|
333
+ | | |none |acc |0.7336|± |0.0062|
334
+ |piqa |Yaml |none |acc |0.8254|± |0.0089|
335
+ | | |none |acc_norm |0.8292|± |0.0088|
336
+ |sciq |Yaml |none |acc |0.9580|± |0.0063|
337
+ | | |none |acc_norm |0.9130|± |0.0089|
338
+ ```
339
+ ### MathQA 0-Shot
340
+ ```
341
+ |Tasks |Version|Filter| Metric |Value | |Stderr|
342
+ |------|-------|------|--------|-----:|---|-----:|
343
+ |mathqa|Yaml |none |acc |0.3752|± |0.0089|
344
+ | | |none |acc_norm|0.3772|± |0.0089|
345
+ ```
346
+ ### PiQa 1-Shot
347
+ ```
348
+ |Tasks|Version|Filter| Metric |Value | |Stderr|
349
+ |-----|-------|------|--------|-----:|---|-----:|
350
+ |piqa |Yaml |none |acc |0.8308|± |0.0087|
351
+ | | |none |acc_norm|0.8357|± |0.0086|
352
+ ```
353
+ ### Winogrande 5-Shot
354
+ ```
355
+ | Tasks |Version|Filter|Metric|Value| |Stderr|
356
+ |----------|-------|------|------|----:|---|-----:|
357
+ |winogrande|Yaml |none |acc |0.768|± |0.0119|
358
+ ```
359
+ ### PubMedQA 0-Shot
360
+ ```
361
+ | Tasks |Version|Filter|Metric|Value| |Stderr|
362
+ |--------|-------|------|------|----:|---|-----:|
363
+ |pubmedqa|Yaml |none |acc | 0.76|± |0.0191|
364
+ ```
365
+ ### RACE 1-Shot
366
+ ```
367
+ |Tasks|Version|Filter|Metric|Value | |Stderr|
368
+ |-----|-------|------|------|-----:|---|-----:|
369
+ |race |Yaml |none |acc |0.5282|± |0.0154|
370
+ ```
371
+ ### MMLU 5-Shot (8-Bit)
372
+ ```
373
+ | Groups |Version|Filter|Metric|Value | |Stderr|
374
+ |------------------|-------|------|------|-----:|---|-----:|
375
+ |mmlu |N/A |none |acc |0.6137|± |0.1243|
376
+ | - humanities |N/A |none |acc |0.5671|± |0.1101|
377
+ | - other |N/A |none |acc |0.6859|± |0.1164|
378
+ | - social_sciences|N/A |none |acc |0.7195|± |0.0713|
379
+ | - stem |N/A |none |acc |0.5087|± |0.1297|
380
+ ```
381
+ ### DROP 3-Shot (8-Bit) (Instruct-Eval)
382
+ ```
383
+ {'score': 0.49801113762927607}
384
+ {'drop': 49.8}
385
+ drop: 49.8
386
+ ```
387
+
388
+ ### CRASS 0-Shot (Instruct-Eval)
389
+ ```
390
+ {'score': 0.8357664233576643}
391
+ {'crass': 83.58}
392
+ crass: 83.58
393
+ ```
394
+
395
+ ## Training Details
396
+
397
+ ### Training hyperparameters
398
+
399
+ The following hyperparameters were used during training:
400
+ - learning_rate: 0.0001
401
+ - train_batch_size: 1
402
+ - eval_batch_size: 1
403
+ - seed: 42
404
+ - distributed_type: multi-GPU
405
+ - num_devices: 14
406
+ - gradient_accumulation_steps: 16
407
+ - total_train_batch_size: 224
408
+ - total_eval_batch_size: 14
409
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
410
+ - lr_scheduler_type: linear
411
+ - lr_scheduler_warmup_ratio: 0.01
412
+ - num_epochs: 1
413
+
414
+ ### Training results
415
+
416
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
417
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
418
+ | 0.4795 | 0.2 | 56 | 0.4958 | -1.3684 | -2.6385 | 0.7552 | 1.2701 | -265.3887 | -241.2612 | -2.2572 | -2.4922 |
419
+ | 0.4642 | 0.4 | 112 | 0.4859 | -1.0380 | -1.9769 | 0.7273 | 0.9389 | -258.7718 | -237.9569 | -2.2414 | -2.4751 |
420
+ | 0.4758 | 0.61 | 168 | 0.4808 | -1.2594 | -2.3704 | 0.7343 | 1.1110 | -262.7074 | -240.1708 | -2.2305 | -2.4633 |
421
+ | 0.4549 | 0.81 | 224 | 0.4768 | -1.1906 | -2.3201 | 0.7552 | 1.1295 | -262.2044 | -239.4827 | -2.2284 | -2.4610 |
422
+
423
+
424
+ ### Framework versions
425
+
426
+ - Transformers 4.35.0-UNA
427
+ - Pytorch 2.1.0
428
+ - Datasets 2.14.6
429
+ - Tokenizers 0.14.1
430
+
431
+ ## Citations
432
+ If you find juanako useful please:
433
+
434
+ ```
435
+ @misc{juanako7buna,
436
+ title={Juanako: Uniform Neural Alignment},
437
+ author={Xavier Murias},
438
+ year={2023},
439
+ publisher = {HuggingFace},
440
+ journal = {HuggingFace repository},
441
+ howpublished = {\url{https://huggingface.co/fblgit/juanako-7b-UNA}},
442
+ }
443
+ ```
444
+
445
+ Thanks to all the brilliant humans behind the creation of AI, here some of the ones that we find relevant to our research. If you feel a citation is missing, please contact.
446
+ ```
447
+ @misc{lin2021truthfulqa,
448
+ title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
449
+ author={Stephanie Lin and Jacob Hilton and Owain Evans},
450
+ year={2021},
451
+ eprint={2109.07958},
452
+ archivePrefix={arXiv},
453
+ primaryClass={cs.CL}
454
+ }
455
+ @misc{tunstall2023zephyr,
456
+ title={Zephyr: Direct Distillation of LM Alignment},
457
+ author={Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Kashif Rasul and Younes Belkada and Shengyi Huang and Leandro von Werra and Clémentine Fourrier and Nathan Habib and Nathan Sarrazin and Omar Sanseviero and Alexander M. Rush and Thomas Wolf},
458
+ year={2023},
459
+ eprint={2310.16944},
460
+ archivePrefix={arXiv},
461
+ primaryClass={cs.LG}
462
+ }
463
+ @inproceedings{Bisk2020,
464
+ author = {Yonatan Bisk and Rowan Zellers and
465
+ Ronan Le Bras and Jianfeng Gao
466
+ and Yejin Choi},
467
+ title = {PIQA: Reasoning about Physical Commonsense in
468
+ Natural Language},
469
+ booktitle = {Thirty-Fourth AAAI Conference on
470
+ Artificial Intelligence},
471
+ year = {2020},
472
+ }
473
+ @software{eval-harness,
474
+ author = {Gao, Leo and
475
+ Tow, Jonathan and
476
+ Biderman, Stella and
477
+ Black, Sid and
478
+ DiPofi, Anthony and
479
+ Foster, Charles and
480
+ Golding, Laurence and
481
+ Hsu, Jeffrey and
482
+ McDonell, Kyle and
483
+ Muennighoff, Niklas and
484
+ Phang, Jason and
485
+ Reynolds, Laria and
486
+ Tang, Eric and
487
+ Thite, Anish and
488
+ Wang, Ben and
489
+ Wang, Kevin and
490
+ Zou, Andy},
491
+ title = {A framework for few-shot language model evaluation},
492
+ month = sep,
493
+ year = 2021,
494
+ publisher = {Zenodo},
495
+ version = {v0.0.1},
496
+ doi = {10.5281/zenodo.5371628},
497
+ url = {https://doi.org/10.5281/zenodo.5371628}
498
+ }
499
+ @misc{rafailov2023direct,
500
+ title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
501
+ author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
502
+ year={2023},
503
+ eprint={2305.18290},
504
+ archivePrefix={arXiv},
505
+ }
506
+ ```
507
+
508
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
509
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__juanako-7b-UNA)
510
+
511
+ | Metric |Value|
512
+ |---------------------------------|----:|
513
+ |Avg. |67.46|
514
+ |AI2 Reasoning Challenge (25-Shot)|68.17|
515
+ |HellaSwag (10-Shot) |85.34|
516
+ |MMLU (5-Shot) |62.47|
517
+ |TruthfulQA (0-shot) |65.13|
518
+ |Winogrande (5-shot) |78.85|
519
+ |GSM8k (5-shot) |44.81|
520
+
521
+
522
+