lucyknada/chargoddard_llama3-42b-v0-3.5bpw-EXL2

Meta's Llama 3 70B pruned to 42B parameters using the methodology described in The Unreasonable Ineffectiveness of the Deeper Layers. Post-pruning trained using QLoRA for ~100M tokens from JeanKaddour/minipile.

Layers to prune selected using PruneMe.

Still evaluating, don't get too excited! Might be incredibly dumb. Check out these zero-shot MMLU numbers though:

Groups	Version	Filter	Metric	Value		Stderr
mmlu	N/A	none	acc	0.7319	±	0.0034
- humanities	N/A	none	acc	0.6582	±	0.0063
- other	N/A	none	acc	0.7927	±	0.0069
- social_sciences	N/A	none	acc	0.8466	±	0.0064
- stem	N/A	none	acc	0.6702	±	0.0079

5-shot:

Groups	Version	Filter	n-shot	Metric	Value		Stderr
mmlu	N/A	none	0	acc	0.7669	±	0.0034
- humanities	N/A	none	5	acc	0.7296	±	0.0062
- other	N/A	none	5	acc	0.8101	±	0.0067
- social_sciences	N/A	none	5	acc	0.8668	±	0.0060
- stem	N/A	none	5	acc	0.6825	±	0.0079

lucyknada
/

chargoddard_llama3-42b-v0-3.5bpw-EXL2