rhysjones
/

gpt2-124M-edu-fineweb-10B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rhysjones commited on Jun 7

Commit

029b2e5

•

1 Parent(s): 7ecae3b

Update README.md

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -19,6 +19,8 @@ Training took 20 hours on a single 4090 GPU, giving the following graphs:
 ![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
 The training parameters where:
 ```
 ./train_gpt2cu \
@@ -37,4 +39,18 @@ The training parameters where:
     -n 5000 \
     -v 250 -s 20000 \
     -h 1
-```

 ![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
+## Training
 The training parameters where:
 ```
 ./train_gpt2cu \
     -n 5000 \
     -v 250 -s 20000 \
     -h 1
+```
+The model has had no further finetuning.
+## Evaluation
+Evals using [Eleuther AI Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463) gives:
+| Eval Test | Score |
+| --------- | ----- |
+| arc_challenge (25 shot) | 24.83 |
+| gsm8k (5 shot) | 0.00 |
+| hellaswag (10 shot) | 32.52 |
+| mmlu (5 shot) | 25.95 |
+| truthfulqa (0 shot) | 42.45 |
+| winogrande (5 shot) | 53.35 |
+| **Overall Score** | **29.85** |