Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,8 @@ Training took 20 hours on a single 4090 GPU, giving the following graphs:
|
|
19 |
|
20 |
![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
|
21 |
|
|
|
|
|
22 |
The training parameters where:
|
23 |
```
|
24 |
./train_gpt2cu \
|
@@ -37,4 +39,18 @@ The training parameters where:
|
|
37 |
-n 5000 \
|
38 |
-v 250 -s 20000 \
|
39 |
-h 1
|
40 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
![gpt2-124M-edu-fineweb-10B](https://huggingface.co/rhysjones/gpt2-124M-edu-fineweb-10B/resolve/main/graph.png)
|
21 |
|
22 |
+
## Training
|
23 |
+
|
24 |
The training parameters where:
|
25 |
```
|
26 |
./train_gpt2cu \
|
|
|
39 |
-n 5000 \
|
40 |
-v 250 -s 20000 \
|
41 |
-h 1
|
42 |
+
```
|
43 |
+
|
44 |
+
The model has had no further finetuning.
|
45 |
+
|
46 |
+
## Evaluation
|
47 |
+
Evals using [Eleuther AI Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/b281b0921b636bc36ad05c0b0b0763bd6dd43463) gives:
|
48 |
+
| Eval Test | Score |
|
49 |
+
| --------- | ----- |
|
50 |
+
| arc_challenge (25 shot) | 24.83 |
|
51 |
+
| gsm8k (5 shot) | 0.00 |
|
52 |
+
| hellaswag (10 shot) | 32.52 |
|
53 |
+
| mmlu (5 shot) | 25.95 |
|
54 |
+
| truthfulqa (0 shot) | 42.45 |
|
55 |
+
| winogrande (5 shot) | 53.35 |
|
56 |
+
| **Overall Score** | **29.85** |
|