Edit model card

flan-ul2 4-bit 128-groupsize GPTQ

Quantized using qwopqwop200's GPTQ-for-Llama repo on the t5 branch.
Original model can be found here: Google/flan-ul2

Quantization command:

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python t5.py ../full-models/flan-ul2 wikitext2 --nsamples 256 --wbits 4 --act-order --groupsize 128 --save ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt

Benchmark command:

python t5.py ../full-models/flan-ul2 wikitext2 --load ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq2.pt --wbits 4 --groupsize 128 --benchmark --benchmark_mode mmlu

Results :

Average accuracy 0.289 - math
Average accuracy 0.562 - health
Average accuracy 0.416 - physics
Average accuracy 0.780 - business
Average accuracy 0.610 - biology
Average accuracy 0.446 - chemistry
Average accuracy 0.461 - computer science
Average accuracy 0.513 - economics
Average accuracy 0.538 - engineering
Average accuracy 0.455 - philosophy
Average accuracy 0.622 - other
Average accuracy 0.703 - history
Average accuracy 0.707 - geography
Average accuracy 0.718 - politics
Average accuracy 0.653 - psychology
Average accuracy 0.711 - culture
Average accuracy 0.447 - law
Average accuracy 0.416 - STEM
Average accuracy 0.501 - humanities
Average accuracy 0.643 - social sciences
Average accuracy 0.613 - other (business, health, misc.)
MMLU Average accuracy: 0.540
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.