Update README.md
Browse files
README.md
CHANGED
@@ -81,6 +81,9 @@ We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-eval
|
|
81 |
| MMLU | 67.9 |
|
82 |
| **Avg.** | **53.3** |
|
83 |
|
|
|
|
|
|
|
84 |
### MTBench
|
85 |
|
86 |
```json
|
@@ -100,6 +103,8 @@ We use [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-eval
|
|
100 |
"average": 7.95
|
101 |
}
|
102 |
```
|
|
|
|
|
103 |
|
104 |
## Prompt Format
|
105 |
|
|
|
81 |
| MMLU | 67.9 |
|
82 |
| **Avg.** | **53.3** |
|
83 |
|
84 |
+
This places DiscoLM 120b firmly ahead of gpt-3.5-turbo-0613 as seen on the screenshot of the current (sadly no longer maintained) FastEval CoT leaderboard:
|
85 |
+
![FastEval Leaderboard](imgs/cot_leaderboard.png)
|
86 |
+
|
87 |
### MTBench
|
88 |
|
89 |
```json
|
|
|
103 |
"average": 7.95
|
104 |
}
|
105 |
```
|
106 |
+
Screenshot of the current FastEval MT Bench leaderboard:
|
107 |
+
![FastEval Leaderboard](imgs/mtbench_leaderboard.png)
|
108 |
|
109 |
## Prompt Format
|
110 |
|