fraserlove commited on
Commit
1def9d3
1 Parent(s): c78587b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -13
README.md CHANGED
@@ -67,16 +67,18 @@ output = model(**encoded)
67
 
68
  ## Evaluation
69
 
70
- | | GPT-α 124M | GPT-2 124M | GPT-Neo 125M | OPT 125M | Pythia 160M |
71
- |--------------------|:-----------------:|:----------:|:------------:|:--------:|:-----------:|
72
- | CommonSenseQA | 19.2% | 19.6% | 19.6% | **20.0%** | 19.9% |
73
- | PIQA | **63.1%** | 62.5% | 62.5% | 62.1% | 61.3% |
74
- | OpenBookQA | **29.8%** | 27.2% | 26.2% | 28.0% | 27.0% |
75
- | TriviaQA | **1.3%** | 0.3% | 0.7% | 1.2% | 0.4% |
76
- | TruthfulQA | 33.1% | 31.7% | **35.7%** | 33.5% | 34.7% |
77
- | MMLU | 23.3% | **25.9%** | 25.6% | **25.9%** | 25.1% |
78
- | WinoGrande | 50.2% | 50.0% | **51.7%** | 51.1% | 48.8% |
79
- | ARC Challenge | **29.2%** | 23.0% | 22.9% | 22.1% | 22.1% |
80
- | HellaSwag | **35.7%** | 31.6% | 30.6% | 31.7% | 30.2% |
81
- | GSM-8K | **2.3%** | 0.7% | 1.7% | 1.7% | 2.2% |
82
- | **Average Score** | **28.7%** | 27.3% | 27.7% | 27.7% | 27.2% |
 
 
 
67
 
68
  ## Evaluation
69
 
70
+ | Benchmark | GPT-α 124M | GPT-2 124M | GPT-Neo 125M | OPT 125M | Pythia 160M |
71
+ |----------------------|:----------:|:----------:|:------------:|:----------:|:-----------:|
72
+ | CommonSenseQA | 19.16% | 19.57% | 19.57% | **19.98%** | 19.90% |
73
+ | PIQA | **63.06%** | 62.51% | 62.46% | 62.08% | 61.26% |
74
+ | SIQA | **38.18%** | 36.59% | 37.21% | 37.21% | 36.69% |
75
+ | OpenBookQA | **29.80%** | 27.20% | 26.20% | 28.00% | 27.00% |
76
+ | TriviaQA | **1.31%** | 0.30% | 0.66% | 1.18% | 0.41% |
77
+ | TruthfulQA | 33.13% | 31.73% | **35.70%** | 33.50% | 34.75% |
78
+ | MMLU | 23.30% | 25.90% | 25.58% | **25.94%** | 25.10% |
79
+ | WinoGrande | 50.20% | 50.04% | **51.70%** | 51.07% | 48.78% |
80
+ | ARC Challenge | **29.18%** | 22.95% | 22.87% | 22.10% | 22.10% |
81
+ | HellaSwag | **35.74%** | 31.64% | 30.58% | 31.69% | 30.15% |
82
+ | GSM-8K | **2.27%** | 0.68% | 1.74% | 1.74% | 2.20% |
83
+ | **Average Score** | **29.58%** | 28.10% | 28.57% | 28.59% | 28.03% |
84
+