Meta's Llama 3 70B pruned to 42B parameters using the methodology described in The Unreasonable Ineffectiveness of the Deeper Layers. Post-pruning trained using QLoRA for ~100M tokens from JeanKaddour/minipile.
Layers to prune selected using PruneMe.
Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
mmlu | N/A | none | 0 | acc | 0.7319 | ± | 0.0034 |
- humanities | N/A | none | 0 | acc | 0.6582 | ± | 0.0063 |
- other | N/A | none | 0 | acc | 0.7927 | ± | 0.0069 |
- social_sciences | N/A | none | 0 | acc | 0.8466 | ± | 0.0064 |
- stem | N/A | none | 0 | acc | 0.6702 | ± | 0.0079 |
5-shot:
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
mmlu | N/A | none | 0 | acc | 0.7669 | ± | 0.0034 |
- humanities | N/A | none | 5 | acc | 0.7296 | ± | 0.0062 |
- other | N/A | none | 5 | acc | 0.8101 | ± | 0.0067 |
- social_sciences | N/A | none | 5 | acc | 0.8668 | ± | 0.0060 |
- stem | N/A | none | 5 | acc | 0.6825 | ± | 0.0079 |
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.