File size: 24,025 Bytes
d0f4700
 
 
 
 
 
 
 
 
2883125
bd61800
d0f4700
 
bd61800
 
8d0ccc4
 
 
 
 
 
bd61800
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2883125
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7920348
d0f4700
7920348
d0f4700
7920348
d0f4700
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7920348
 
 
 
 
a619d18
 
 
7920348
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0f4700
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- merge
- moe
- biology
- medical
- MoE
- genetic
---

## Description

3 Models used in this merge of 4x7B:
- [cognitivecomputations/WestLake-7B-v2-laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser)
- [mlabonne/Daredevil-7B](https://huggingface.co/mlabonne/Daredevil-7B) (used twice! initially chose another model, but using it twice worked better somehow.)
- [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B)


This is a MoE of top 4x 7B models including BioMistral-7B. Here is the comparison between the two:

| Metric                      | BioMistral-7B | Bioxtral-4x7B-v0.1 |
|-----------------------------|---------------|--------------------|
| **Average**                 | 54.99         | **70.84**              |
| ARC                         | 54.27         | **68.34**              |
| HellaSwag                   | 79.09         | **87.27**              |
| TruthfulQA                  | 51.61         | **68.45**              |
| Winogrande                  | 73.48         | **82.90**              |
| GSM8K                       | 0             | **56.63**              |
| Professional Medicine       | 55.51         | **67.3**               |
| College Medicine            | 58.96         | **61.84**              |
| Medical Genetics            | 67.00         | **74.0**               |


## How to use it

```python
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MaziyarPanahi/Bioxtral-4x7B-v0.1")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Bioxtral-4x7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Bioxtral-4x7B-v0.1")
```

## Quantized mode

Here is the list of GGUF models quantized from 2 to 8 bits: https://huggingface.co/MaziyarPanahi/Bioxtral-4x7B-v0.1-GGUF

## Examples


A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?

{ "A": "Ampicillin", "B": "Ceftriaxone", "C": "Doxycycline", "D": "Nitrofurantoin" }

```
The best treatment option for this patient would be **D: Nitrofurantoin**. This choice is in accordance with current guidelines recommending nitrofurantoin as the first line agent for uncomplicated lower UTI during pregnancy due to its safety profile. Although ampicillin and ceftriaxone may also be used if there is concern for pyelonephritis or resistance, they should generally be reserved for more severe cases. Doxycycline is contraindicated in pregnancy.</s>
```


A 67-year-old woman with advanced bladder cancer comes to the physician for a follow-up examination. She is currently undergoing chemotherapy with an agent that forms cross-links between DNA strands. Serum studies show a creatinine concentration of 2.1 mg/dL and a blood urea nitrogen concentration of 30 mg/dL. Urine dipstick of a clean-catch midstream specimen shows 2+ protein and 1+ glucose. Prior to initiation of chemotherapy, her laboratory values were within the reference range. In addition to hydration, administration of which of the following would most likely have prevented this patient's current condition?

{ "A": "Mesna", "B": "Amifostine", "C": "Rasburicase", "D": "Leucovorin" }


```
B: Amifostine</s>
```


describe about pros and cons of docker system.

```
Answer:

Docker is an open-source platform for developing, shipping, and running applications by using containers. Like all technologies, Docker has its advantages and disadvantages. Here are some key points on both sides:

Pros (Advantages) of Docker:
1. **Portability**: One of the main benefits of Docker is its ability to create portable containers holding an application with all its dependencies. This makes it easy to run the same application consistently across different environments such as development, testing, and production.
2. **Efficiency**: Containers are lightweight and use resources efficiently compared to virtual machines. This allows more applications to be hosted on a single server.
3. **Simplified deployment**: With Docker, developers can package their code along with all its dependencies into a container which can then be deployed easily without worrying about compatibility issues.
4. **Improved collaboration**: Docker provides tools like Docker Hub which allow developers to share their images publicly or privately. This promotes collaboration among teams working on the same project.
5. **Automation**: Docker integrates well with many continuous integration and delivery systems, making it easier to automate the building, testing, and deploying of applications.

Cons (Disadvantages) of Docker:
1. **Steep learning curve**: For those new to containerization technology, there may be a significant learning curve involved in understanding how Docker works and mastering its commands.
2. **Additional management overhead**: Managing Docker containers requires additional effort compared to traditional methods. This includes tasks like monitoring, scaling, and orchestrating containers.
3. **Potential security risks**: While Docker itself is secure, improper configuration of containers could lead to potential vulnerabilities. It’s important to follow best practices when creating and managing containers.
4. **Vendor lock-in**: Although Docker is open source, if your organization relies heavily on proprietary extensions from Docker Inc., switching to another solution might become difficult.
5. **Limited storage**: Each container has its own file system, so storing large amounts of data within the container can be challenging. To address this issue, Docker Volumes were introduced but they require separate management.</s>
```


Hi, I need to get from New York to Los Angeles. Can you help me with that?
```

```tool_call
{
 "arguments": {"origin": "New York", "destination": "Los Angeles", "mode": ""},
 "name": "get_directions"
}
```</s>

```


There are three killers in a room. Someone enters the room and kills one of them. Nobody leaves the room. How many killers are left in the room? Explain your reasoning step by step.

```
1. Initially, there are three killers in the room.
2. Then, someone else comes into the room and kills one of these three killers.
3. However, this "someone" is also considered as a person or killer (since they committed an act of killing).
4. So after the event, we have four people in total - the remaining two original killers plus the newcomer who killed one of the initial killers.
5. But since our question asks for the number of killers, we should only count those who were originally killers before anyone acted.
6. Therefore, there are still two killers left in the room.
```
Answer should be 3, the new person walking in is also a killer!


Mark is faster than Mary , Mary is faster than Joe. Is Joe faster than Mark? Let's think step by step

```
Let's analyze the given statements one by one:

1. Mark is faster than Mary.
2. Mary is faster than Joe.

Now, let's try to determine the relative speeds of all three individuals based on these statements:

- From statement 1, we know that Mark has a higher speed compared to Mary.
- From statement 2, we know that Mary has a higher speed compared to Joe.

Combining both pieces of information, we can conclude that Mark must have the highest speed among the three (since he is faster than Mary and Mary is faster than Joe). Therefore, Joe is not faster than Mark.</s>
```


If I lay 2 wet shirts out in the sun flat on the ground to dry and it takes 5 hours until they are dry, how long would it take to dry 4 shirts of the same type that way simultanously? 

```
Assuming that the drying process is not limited by any other factor than the surface area exposed to the sunlight, we can use a simple proportionality argument.

If 2 shirts take 5 hours to dry when laid flat on the ground, then the combined surface area of these two shirts is equivalent to the drying time of 5 hours. Let's call this surface area equivalent to 5 "drying-hours per unit area."

Now, if we have 4 shirts of the same type and we lay them all flat on the ground at the same time, their combined surface area will be greater than that of the two previously dried shirts. However, since they are all of the same type, we can assume that the ratio of their surface areas to the drying time of 5 hours remains constant. In other words, the 4 new shirts will also have a surface area equivalent to 5 "drying-hours per unit area."

To find out how long it would take for these 4 shirts to dry simultaneously, we need to divide the total drying time (5 hours) by the number of shirts (4). This gives us:

Drying time for 4 shirts = Total drying time / Number of shirts
                        = 5 hours / 4
                        = 1.25 hours per shirt

So, it would take approximately 1 hour and 15 minutes (or 75 minutes) for the 4 shirts to dry simultaneously under the same conditions as the original 2 shirts.</s>
```
Wow! Way to go to be wrong! It's the same, they are drying not baking! 


explain step by step 25-4*2+3=?
```
Step 1: Identify the multiplication operation first. In this case, it is "4 * 2".

Step 2: Perform the multiplication. Multiplying 4 and 2 gives you 8.

Step 3: Now we have the expression "25 - 8 + 3". 

Step 4: Solve the subtraction next. Subtracting 8 from 25 gives you 17.

Step 5: Finally, add 3 to the result of the previous step. Adding 3 to 17 gives you a final answer of 20.

So, 25 - 4 * 2 + 3 = 20.</s>
```


## Eval

![image/png](https://cdn-uploads.huggingface.co/production/uploads/5fd5e18a90b6dc4633f6d292/PR-Py7u6uhcxKTdCpPY4-.png)


source: https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Bioxtral-4x7B-v0.1

```python
{
    "all": {
        "acc": 0.6390815384774987,
        "acc_stderr": 0.03233527173865626,
        "acc_norm": 0.6405373328568302,
        "acc_norm_stderr": 0.032994557880045274,
        "mc1": 0.5152998776009792,
        "mc1_stderr": 0.017495304473187902,
        "mc2": 0.6845419346695587,
        "mc2_stderr": 0.014829461272743373
    },
    "harness|arc:challenge|25": {
        "acc": 0.658703071672355,
        "acc_stderr": 0.01385583128749772,
        "acc_norm": 0.6834470989761092,
        "acc_norm_stderr": 0.013592431519068079
    },
    "harness|hellaswag|10": {
        "acc": 0.6946823341963753,
        "acc_stderr": 0.004596006250433548,
        "acc_norm": 0.8727345150368453,
        "acc_norm_stderr": 0.003325890225529856
    },
    "harness|hendrycksTest-abstract_algebra|5": {
        "acc": 0.31,
        "acc_stderr": 0.04648231987117316,
        "acc_norm": 0.31,
        "acc_norm_stderr": 0.04648231987117316
    },
    "harness|hendrycksTest-anatomy|5": {
        "acc": 0.6370370370370371,
        "acc_stderr": 0.04153948404742397,
        "acc_norm": 0.6370370370370371,
        "acc_norm_stderr": 0.04153948404742397
    },
    "harness|hendrycksTest-astronomy|5": {
        "acc": 0.7105263157894737,
        "acc_stderr": 0.03690677986137283,
        "acc_norm": 0.7105263157894737,
        "acc_norm_stderr": 0.03690677986137283
    },
    "harness|hendrycksTest-business_ethics|5": {
        "acc": 0.63,
        "acc_stderr": 0.04852365870939099,
        "acc_norm": 0.63,
        "acc_norm_stderr": 0.04852365870939099
    },
    "harness|hendrycksTest-clinical_knowledge|5": {
        "acc": 0.6943396226415094,
        "acc_stderr": 0.028353298073322663,
        "acc_norm": 0.6943396226415094,
        "acc_norm_stderr": 0.028353298073322663
    },
    "harness|hendrycksTest-college_biology|5": {
        "acc": 0.7222222222222222,
        "acc_stderr": 0.037455547914624555,
        "acc_norm": 0.7222222222222222,
        "acc_norm_stderr": 0.037455547914624555
    },
    "harness|hendrycksTest-college_chemistry|5": {
        "acc": 0.44,
        "acc_stderr": 0.04988876515698589,
        "acc_norm": 0.44,
        "acc_norm_stderr": 0.04988876515698589
    },
    "harness|hendrycksTest-college_computer_science|5": {
        "acc": 0.56,
        "acc_stderr": 0.049888765156985884,
        "acc_norm": 0.56,
        "acc_norm_stderr": 0.049888765156985884
    },
    "harness|hendrycksTest-college_mathematics|5": {
        "acc": 0.29,
        "acc_stderr": 0.04560480215720684,
        "acc_norm": 0.29,
        "acc_norm_stderr": 0.04560480215720684
    },
    "harness|hendrycksTest-college_medicine|5": {
        "acc": 0.6184971098265896,
        "acc_stderr": 0.03703851193099521,
        "acc_norm": 0.6184971098265896,
        "acc_norm_stderr": 0.03703851193099521
    },
    "harness|hendrycksTest-college_physics|5": {
        "acc": 0.43137254901960786,
        "acc_stderr": 0.04928099597287534,
        "acc_norm": 0.43137254901960786,
        "acc_norm_stderr": 0.04928099597287534
    },
    "harness|hendrycksTest-computer_security|5": {
        "acc": 0.78,
        "acc_stderr": 0.041633319989322605,
        "acc_norm": 0.78,
        "acc_norm_stderr": 0.041633319989322605
    },
    "harness|hendrycksTest-conceptual_physics|5": {
        "acc": 0.5829787234042553,
        "acc_stderr": 0.03223276266711712,
        "acc_norm": 0.5829787234042553,
        "acc_norm_stderr": 0.03223276266711712
    },
    "harness|hendrycksTest-econometrics|5": {
        "acc": 0.45614035087719296,
        "acc_stderr": 0.04685473041907789,
        "acc_norm": 0.45614035087719296,
        "acc_norm_stderr": 0.04685473041907789
    },
    "harness|hendrycksTest-electrical_engineering|5": {
        "acc": 0.5310344827586206,
        "acc_stderr": 0.04158632762097828,
        "acc_norm": 0.5310344827586206,
        "acc_norm_stderr": 0.04158632762097828
    },
    "harness|hendrycksTest-elementary_mathematics|5": {
        "acc": 0.42328042328042326,
        "acc_stderr": 0.025446365634406786,
        "acc_norm": 0.42328042328042326,
        "acc_norm_stderr": 0.025446365634406786
    },
    "harness|hendrycksTest-formal_logic|5": {
        "acc": 0.47619047619047616,
        "acc_stderr": 0.04467062628403273,
        "acc_norm": 0.47619047619047616,
        "acc_norm_stderr": 0.04467062628403273
    },
    "harness|hendrycksTest-global_facts|5": {
        "acc": 0.28,
        "acc_stderr": 0.04512608598542128,
        "acc_norm": 0.28,
        "acc_norm_stderr": 0.04512608598542128
    },
    "harness|hendrycksTest-high_school_biology|5": {
        "acc": 0.7516129032258064,
        "acc_stderr": 0.024580028921481003,
        "acc_norm": 0.7516129032258064,
        "acc_norm_stderr": 0.024580028921481003
    },
    "harness|hendrycksTest-high_school_chemistry|5": {
        "acc": 0.4975369458128079,
        "acc_stderr": 0.03517945038691063,
        "acc_norm": 0.4975369458128079,
        "acc_norm_stderr": 0.03517945038691063
    },
    "harness|hendrycksTest-high_school_computer_science|5": {
        "acc": 0.65,
        "acc_stderr": 0.047937248544110196,
        "acc_norm": 0.65,
        "acc_norm_stderr": 0.047937248544110196
    },
    "harness|hendrycksTest-high_school_european_history|5": {
        "acc": 0.7757575757575758,
        "acc_stderr": 0.032568666616811015,
        "acc_norm": 0.7757575757575758,
        "acc_norm_stderr": 0.032568666616811015
    },
    "harness|hendrycksTest-high_school_geography|5": {
        "acc": 0.7878787878787878,
        "acc_stderr": 0.029126522834586815,
        "acc_norm": 0.7878787878787878,
        "acc_norm_stderr": 0.029126522834586815
    },
    "harness|hendrycksTest-high_school_government_and_politics|5": {
        "acc": 0.8808290155440415,
        "acc_stderr": 0.02338193534812142,
        "acc_norm": 0.8808290155440415,
        "acc_norm_stderr": 0.02338193534812142
    },
    "harness|hendrycksTest-high_school_macroeconomics|5": {
        "acc": 0.6666666666666666,
        "acc_stderr": 0.02390115797940254,
        "acc_norm": 0.6666666666666666,
        "acc_norm_stderr": 0.02390115797940254
    },
    "harness|hendrycksTest-high_school_mathematics|5": {
        "acc": 0.3333333333333333,
        "acc_stderr": 0.028742040903948485,
        "acc_norm": 0.3333333333333333,
        "acc_norm_stderr": 0.028742040903948485
    },
    "harness|hendrycksTest-high_school_microeconomics|5": {
        "acc": 0.6890756302521008,
        "acc_stderr": 0.030066761582977927,
        "acc_norm": 0.6890756302521008,
        "acc_norm_stderr": 0.030066761582977927
    },
    "harness|hendrycksTest-high_school_physics|5": {
        "acc": 0.36423841059602646,
        "acc_stderr": 0.03929111781242742,
        "acc_norm": 0.36423841059602646,
        "acc_norm_stderr": 0.03929111781242742
    },
    "harness|hendrycksTest-high_school_psychology|5": {
        "acc": 0.8165137614678899,
        "acc_stderr": 0.01659525971039931,
        "acc_norm": 0.8165137614678899,
        "acc_norm_stderr": 0.01659525971039931
    },
    "harness|hendrycksTest-high_school_statistics|5": {
        "acc": 0.5,
        "acc_stderr": 0.034099716973523674,
        "acc_norm": 0.5,
        "acc_norm_stderr": 0.034099716973523674
    },
    "harness|hendrycksTest-high_school_us_history|5": {
        "acc": 0.803921568627451,
        "acc_stderr": 0.027865942286639318,
        "acc_norm": 0.803921568627451,
        "acc_norm_stderr": 0.027865942286639318
    },
    "harness|hendrycksTest-high_school_world_history|5": {
        "acc": 0.7932489451476793,
        "acc_stderr": 0.02636165166838909,
        "acc_norm": 0.7932489451476793,
        "acc_norm_stderr": 0.02636165166838909
    },
    "harness|hendrycksTest-human_aging|5": {
        "acc": 0.6816143497757847,
        "acc_stderr": 0.03126580522513713,
        "acc_norm": 0.6816143497757847,
        "acc_norm_stderr": 0.03126580522513713
    },
    "harness|hendrycksTest-human_sexuality|5": {
        "acc": 0.7480916030534351,
        "acc_stderr": 0.03807387116306085,
        "acc_norm": 0.7480916030534351,
        "acc_norm_stderr": 0.03807387116306085
    },
    "harness|hendrycksTest-international_law|5": {
        "acc": 0.7851239669421488,
        "acc_stderr": 0.037494924487096966,
        "acc_norm": 0.7851239669421488,
        "acc_norm_stderr": 0.037494924487096966
    },
    "harness|hendrycksTest-jurisprudence|5": {
        "acc": 0.7314814814814815,
        "acc_stderr": 0.042844679680521934,
        "acc_norm": 0.7314814814814815,
        "acc_norm_stderr": 0.042844679680521934
    },
    "harness|hendrycksTest-logical_fallacies|5": {
        "acc": 0.7484662576687117,
        "acc_stderr": 0.03408997886857529,
        "acc_norm": 0.7484662576687117,
        "acc_norm_stderr": 0.03408997886857529
    },
    "harness|hendrycksTest-machine_learning|5": {
        "acc": 0.44642857142857145,
        "acc_stderr": 0.047184714852195886,
        "acc_norm": 0.44642857142857145,
        "acc_norm_stderr": 0.047184714852195886
    },
    "harness|hendrycksTest-management|5": {
        "acc": 0.7669902912621359,
        "acc_stderr": 0.04185832598928315,
        "acc_norm": 0.7669902912621359,
        "acc_norm_stderr": 0.04185832598928315
    },
    "harness|hendrycksTest-marketing|5": {
        "acc": 0.8717948717948718,
        "acc_stderr": 0.02190190511507333,
        "acc_norm": 0.8717948717948718,
        "acc_norm_stderr": 0.02190190511507333
    },
    "harness|hendrycksTest-medical_genetics|5": {
        "acc": 0.74,
        "acc_stderr": 0.04408440022768079,
        "acc_norm": 0.74,
        "acc_norm_stderr": 0.04408440022768079
    },
    "harness|hendrycksTest-miscellaneous|5": {
        "acc": 0.8186462324393359,
        "acc_stderr": 0.013778693778464074,
        "acc_norm": 0.8186462324393359,
        "acc_norm_stderr": 0.013778693778464074
    },
    "harness|hendrycksTest-moral_disputes|5": {
        "acc": 0.7254335260115607,
        "acc_stderr": 0.02402774515526502,
        "acc_norm": 0.7254335260115607,
        "acc_norm_stderr": 0.02402774515526502
    },
    "harness|hendrycksTest-moral_scenarios|5": {
        "acc": 0.47374301675977654,
        "acc_stderr": 0.016699427672784768,
        "acc_norm": 0.47374301675977654,
        "acc_norm_stderr": 0.016699427672784768
    },
    "harness|hendrycksTest-nutrition|5": {
        "acc": 0.7058823529411765,
        "acc_stderr": 0.026090162504279053,
        "acc_norm": 0.7058823529411765,
        "acc_norm_stderr": 0.026090162504279053
    },
    "harness|hendrycksTest-philosophy|5": {
        "acc": 0.7009646302250804,
        "acc_stderr": 0.02600330111788514,
        "acc_norm": 0.7009646302250804,
        "acc_norm_stderr": 0.02600330111788514
    },
    "harness|hendrycksTest-prehistory|5": {
        "acc": 0.7098765432098766,
        "acc_stderr": 0.025251173936495033,
        "acc_norm": 0.7098765432098766,
        "acc_norm_stderr": 0.025251173936495033
    },
    "harness|hendrycksTest-professional_accounting|5": {
        "acc": 0.4645390070921986,
        "acc_stderr": 0.02975238965742705,
        "acc_norm": 0.4645390070921986,
        "acc_norm_stderr": 0.02975238965742705
    },
    "harness|hendrycksTest-professional_law|5": {
        "acc": 0.42894393741851367,
        "acc_stderr": 0.012640625443067358,
        "acc_norm": 0.42894393741851367,
        "acc_norm_stderr": 0.012640625443067358
    },
    "harness|hendrycksTest-professional_medicine|5": {
        "acc": 0.6727941176470589,
        "acc_stderr": 0.028501452860396553,
        "acc_norm": 0.6727941176470589,
        "acc_norm_stderr": 0.028501452860396553
    },
    "harness|hendrycksTest-professional_psychology|5": {
        "acc": 0.6437908496732027,
        "acc_stderr": 0.019373332420724507,
        "acc_norm": 0.6437908496732027,
        "acc_norm_stderr": 0.019373332420724507
    },
    "harness|hendrycksTest-public_relations|5": {
        "acc": 0.7090909090909091,
        "acc_stderr": 0.04350271442923243,
        "acc_norm": 0.7090909090909091,
        "acc_norm_stderr": 0.04350271442923243
    },
    "harness|hendrycksTest-security_studies|5": {
        "acc": 0.7061224489795919,
        "acc_stderr": 0.02916273841024977,
        "acc_norm": 0.7061224489795919,
        "acc_norm_stderr": 0.02916273841024977
    },
    "harness|hendrycksTest-sociology|5": {
        "acc": 0.8009950248756219,
        "acc_stderr": 0.028231365092758406,
        "acc_norm": 0.8009950248756219,
        "acc_norm_stderr": 0.028231365092758406
    },
    "harness|hendrycksTest-us_foreign_policy|5": {
        "acc": 0.88,
        "acc_stderr": 0.03265986323710906,
        "acc_norm": 0.88,
        "acc_norm_stderr": 0.03265986323710906
    },
    "harness|hendrycksTest-virology|5": {
        "acc": 0.5542168674698795,
        "acc_stderr": 0.03869543323472101,
        "acc_norm": 0.5542168674698795,
        "acc_norm_stderr": 0.03869543323472101
    },
    "harness|hendrycksTest-world_religions|5": {
        "acc": 0.8362573099415205,
        "acc_stderr": 0.028380919596145866,
        "acc_norm": 0.8362573099415205,
        "acc_norm_stderr": 0.028380919596145866
    },
    "harness|truthfulqa:mc|0": {
        "mc1": 0.5152998776009792,
        "mc1_stderr": 0.017495304473187902,
        "mc2": 0.6845419346695587,
        "mc2_stderr": 0.014829461272743373
    },
    "harness|winogrande|5": {
        "acc": 0.8287292817679558,
        "acc_stderr": 0.010588417294962524
    },
    "harness|gsm8k|5": {
        "acc": 0.5663381349507203,
        "acc_stderr": 0.013650728047064688
    }
}

```