chargoddard
commited on
Commit
•
d9acebe
1
Parent(s):
ae96f79
Update README.md
Browse files
README.md
CHANGED
@@ -9,23 +9,19 @@ tags:
|
|
9 |
- mergekit
|
10 |
- llama
|
11 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).
|
14 |
|
15 |
Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).
|
16 |
|
17 |
-
Still evaluating, don't get too excited! Might be incredibly dumb. Check out these
|
18 |
-
|
19 |
-
|
20 |
-
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
21 |
-
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
22 |
-
|mmlu |N/A |none | 0|acc |0.7319|± |0.0034|
|
23 |
-
| - humanities |N/A |none | 0|acc |0.6582|± |0.0063|
|
24 |
-
| - other |N/A |none | 0|acc |0.7927|± |0.0069|
|
25 |
-
| - social_sciences|N/A |none | 0|acc |0.8466|± |0.0064|
|
26 |
-
| - stem |N/A |none | 0|acc |0.6702|± |0.0079|
|
27 |
-
|
28 |
-
5-shot:
|
29 |
|
30 |
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
31 |
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
@@ -34,5 +30,8 @@ Still evaluating, don't get too excited! Might be incredibly dumb. Check out the
|
|
34 |
| - other |N/A |none | 5|acc |0.8101|± |0.0067|
|
35 |
| - social_sciences|N/A |none | 5|acc |0.8668|± |0.0060|
|
36 |
| - stem |N/A |none | 5|acc |0.6825|± |0.0079|
|
|
|
|
|
|
|
37 |
|
38 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|
|
|
9 |
- mergekit
|
10 |
- llama
|
11 |
---
|
12 |
+
> 🚨 THIS IS A BASE MODEL 🚨
|
13 |
+
>
|
14 |
+
> This model is pruned from the base Llama 3 70B, which has no instruction tuning and randomly initialized special tokens.
|
15 |
+
>
|
16 |
+
> Using this with the Llama 3 instruction format is injecting random noise into latent space and will give you deranged results. (It's pretty funny actually.)
|
17 |
+
> Treat this as the untrained foundation model this is and use appropriate prompts.
|
18 |
+
|
19 |
|
20 |
Meta's Llama 3 70B pruned to 42B parameters using the methodology described in [The Unreasonable Ineffectiveness of the Deeper Layers](https://arxiv.org/abs/2403.17887). Post-pruning trained using QLoRA for ~100M tokens from [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile).
|
21 |
|
22 |
Layers to prune selected using [PruneMe](https://github.com/arcee-ai/PruneMe).
|
23 |
|
24 |
+
Still evaluating, don't get too excited! Might be incredibly dumb. Check out these numbers though:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
27 |
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
|
|
30 |
| - other |N/A |none | 5|acc |0.8101|± |0.0067|
|
31 |
| - social_sciences|N/A |none | 5|acc |0.8668|± |0.0060|
|
32 |
| - stem |N/A |none | 5|acc |0.6825|± |0.0079|
|
33 |
+
|winogrande| 1|none | 5|acc |0.8027|± |0.0112|
|
34 |
+
|hellaswag| 1|none | 10|acc_norm|0.8025|± |0.0040|
|
35 |
+
|
36 |
|
37 |
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
|