fraserlove
commited on
Commit
•
1def9d3
1
Parent(s):
c78587b
Update README.md
Browse files
README.md
CHANGED
@@ -67,16 +67,18 @@ output = model(**encoded)
|
|
67 |
|
68 |
## Evaluation
|
69 |
|
70 |
-
|
|
71 |
-
|
72 |
-
| CommonSenseQA
|
73 |
-
| PIQA
|
74 |
-
|
|
75 |
-
|
|
76 |
-
|
|
77 |
-
|
|
78 |
-
|
|
79 |
-
|
|
80 |
-
|
|
81 |
-
|
|
82 |
-
|
|
|
|
|
|
|
67 |
|
68 |
## Evaluation
|
69 |
|
70 |
+
| Benchmark | GPT-α 124M | GPT-2 124M | GPT-Neo 125M | OPT 125M | Pythia 160M |
|
71 |
+
|----------------------|:----------:|:----------:|:------------:|:----------:|:-----------:|
|
72 |
+
| CommonSenseQA | 19.16% | 19.57% | 19.57% | **19.98%** | 19.90% |
|
73 |
+
| PIQA | **63.06%** | 62.51% | 62.46% | 62.08% | 61.26% |
|
74 |
+
| SIQA | **38.18%** | 36.59% | 37.21% | 37.21% | 36.69% |
|
75 |
+
| OpenBookQA | **29.80%** | 27.20% | 26.20% | 28.00% | 27.00% |
|
76 |
+
| TriviaQA | **1.31%** | 0.30% | 0.66% | 1.18% | 0.41% |
|
77 |
+
| TruthfulQA | 33.13% | 31.73% | **35.70%** | 33.50% | 34.75% |
|
78 |
+
| MMLU | 23.30% | 25.90% | 25.58% | **25.94%** | 25.10% |
|
79 |
+
| WinoGrande | 50.20% | 50.04% | **51.70%** | 51.07% | 48.78% |
|
80 |
+
| ARC Challenge | **29.18%** | 22.95% | 22.87% | 22.10% | 22.10% |
|
81 |
+
| HellaSwag | **35.74%** | 31.64% | 30.58% | 31.69% | 30.15% |
|
82 |
+
| GSM-8K | **2.27%** | 0.68% | 1.74% | 1.74% | 2.20% |
|
83 |
+
| **Average Score** | **29.58%** | 28.10% | 28.57% | 28.59% | 28.03% |
|
84 |
+
|