Files changed (1) hide show
  1. README.md +27 -13
README.md CHANGED
@@ -1,21 +1,16 @@
1
  ---
2
  tags:
3
  - fp8
 
4
  ---
5
 
6
- Mixtral-8x7B-Instruct-v0.1 quantized to FP8 weights and activations, meant to be deployed in vLLM.
7
 
8
- Accuracy on MMLU:
9
- ```
10
- vllm (pretrained=nm-testing/Mixtral-8x7B-Instruct-v0.1-FP8), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
11
- | Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
12
- |------------------|-------|------|-----:|------|-----:|---|-----:|
13
- |mmlu |N/A |none | 0|acc |0.7008|± |0.0036|
14
- | - humanities |N/A |none | 5|acc |0.6453|± |0.0065|
15
- | - other |N/A |none | 5|acc |0.7692|± |0.0072|
16
- | - social_sciences|N/A |none | 5|acc |0.8083|± |0.0070|
17
- | - stem |N/A |none | 5|acc |0.6115|± |0.0083|
18
- ```
19
 
20
  Quantized using the script below:
21
 
@@ -313,4 +308,23 @@ if __name__ == "__main__":
313
 
314
  print("Exporting model with static weights and static activations")
315
  save_quantized_model(model, args.activation_scheme, args.save_dir)
316
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
  - fp8
4
+ - vllm
5
  ---
6
 
7
+ # Mixtral-8x7B-Instruct-v0.1-FP8
8
 
9
+ ## Model Overview
10
+ Mixtral-8x7B-Instruct-v0.1 quantized to FP8 weights and activations, ready for inference with vLLM >= 0.5.0.
11
+
12
+ ## Usage and Creation
13
+ Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
 
 
 
 
 
 
14
 
15
  Quantized using the script below:
16
 
 
308
 
309
  print("Exporting model with static weights and static activations")
310
  save_quantized_model(model, args.activation_scheme, args.save_dir)
311
+ ```
312
+
313
+ ## Evaluation
314
+
315
+ ### Open LLM Leaderboard evaluation scores
316
+ | | Mixtral-8x7B-Instruct-v0.1 | Mixtral-8x7B-Instruct-v0.1-FP8<br>(this model) |
317
+ | :------------------: | :----------------------: | :------------------------------------------------: |
318
+ | arc-c<br>25-shot | 71.50 | 70.05 |
319
+ | hellaswag<br>10-shot | 87.53 | 86.30 |
320
+ | mmlu<br>5-shot | 70.33 | 68.81 |
321
+ | truthfulqa<br>0-shot | 64.79 | 63.69 |
322
+ | winogrande<br>5-shot | 82.40 | 81.69 |
323
+ | gsm8k<br>5-shot | 64.36 | 59.82 |
324
+ | **Average<br>Accuracy** | **73.48** | **71.72** |
325
+ | **Recovery** | **100%** | **97.60%** |
326
+
327
+
328
+
329
+
330
+