flan-ul2 4-bit 128-groupsize GPTQ
Quantized using qwopqwop200's GPTQ-for-Llama repo on the t5 branch.
Original model can be found here: Google/flan-ul2
Quantization command:
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python t5.py ../full-models/flan-ul2 wikitext2 --nsamples 256 --wbits 4 --act-order --groupsize 128 --save ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt
Benchmark command:
python t5.py ../full-models/flan-ul2 wikitext2 --load ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq2.pt --wbits 4 --groupsize 128 --benchmark --benchmark_mode mmlu
Results :
Average accuracy 0.289 - math
Average accuracy 0.562 - health
Average accuracy 0.416 - physics
Average accuracy 0.780 - business
Average accuracy 0.610 - biology
Average accuracy 0.446 - chemistry
Average accuracy 0.461 - computer science
Average accuracy 0.513 - economics
Average accuracy 0.538 - engineering
Average accuracy 0.455 - philosophy
Average accuracy 0.622 - other
Average accuracy 0.703 - history
Average accuracy 0.707 - geography
Average accuracy 0.718 - politics
Average accuracy 0.653 - psychology
Average accuracy 0.711 - culture
Average accuracy 0.447 - law
Average accuracy 0.416 - STEM
Average accuracy 0.501 - humanities
Average accuracy 0.643 - social sciences
Average accuracy 0.613 - other (business, health, misc.)
MMLU Average accuracy: 0.540
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.