Nekochu commited on
Commit
d713289
1 Parent(s): ac0853b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -180,10 +180,14 @@ Note: Output from inference [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Fac
180
 
181
  </details>
182
 
183
- # Eval
 
184
 
185
- [MMLU-Pro](https://github.com/chigkim/Ollama-MMLU-Pro):[logs](https://pastebin.com/WsTUCduN) (en):
186
  | Model | Overall Accuracy | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other |
187
  |----------------------------------|----------------------|---------|----------|-----------|------------------|-----------|-------------|--------|---------|------|-------|------------|---------|------------|-------|
188
- | Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 38.83 | 60.81 | 37.26 | 32.86 | 38.78 | 46.33 | 23.32 | 45.48 | 39.90 | 21.62 | 38.86 | 34.67 | 28.79 | 50.63 | 44.26 |
189
- | **Llama-3.1-8B-German-ORPO-8.0bpw-h8-exl2** | 46.16 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
 
 
 
 
180
 
181
  </details>
182
 
183
+ <details>
184
+ <summary>Eval</summary>
185
 
186
+ [MMLU-Pro](https://github.com/chigkim/Ollama-MMLU-Pro)[*](https://pastebin.com/a8xRqXtg) (en):
187
  | Model | Overall Accuracy | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other |
188
  |----------------------------------|----------------------|---------|----------|-----------|------------------|-----------|-------------|--------|---------|------|-------|------------|---------|------------|-------|
189
+ | Llama-3.1-8B-German-ORPO-8.0bpw-h8-exl2 | 38.83 | 60.81 | 37.26 | 32.86 | 38.78 | 46.33 | 23.32 | 45.48 | 39.90 | 21.62 | 38.86 | 34.67 | 28.79 | 50.63 | 44.26 |
190
+ | Llama-3.1-8B-Instruct-exl2-8bpw-h8 | 46.16 | 63.74 | 49.68 | 36.93 | 48.29 | 55.81 | 28.59 | 52.81 | 45.67 | 30.79 | 45.08 | 40.48 | 39.03 | 60.90 | 48.38 |
191
+
192
+ Note: English seems to be degraded, and not frequently but the output repeats sentences (because of the wrong chat template).
193
+ </details>