spmurrayzzz
commited on
Commit
•
6780f05
1
Parent(s):
d95d34d
Update README.md
Browse files
README.md
CHANGED
@@ -26,3 +26,14 @@ the training dynamics specific to large language models. The dataset used in fin
|
|
26 |
a "syndicate" of other open language models both of similar parameter size and larger. Each model would generate a
|
27 |
response for a given instruction, and the group would vote on which model's response was best.
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
a "syndicate" of other open language models both of similar parameter size and larger. Each model would generate a
|
27 |
response for a given instruction, and the group would vote on which model's response was best.
|
28 |
|
29 |
+
## Evaluation Results
|
30 |
+
_12.30.23_
|
31 |
+
| Benchmark | Result |
|
32 |
+
|------------|--------|
|
33 |
+
| ARC | 60.84 |
|
34 |
+
| HellaSwag | 82.91 |
|
35 |
+
| MMLU | 60.83 |
|
36 |
+
| TruthfulQA | 43.71 |
|
37 |
+
| Winogrande | 78.61 |
|
38 |
+
| GSM8K | 44.50 |
|
39 |
+
|