askmyteapot
/

GPT4-x-AlpacaDente2-30b-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Edit model card

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b

My secret sauce:

Using comit 3c16fd9 of 0cc4m's GPTQ fork
Using PTB as the calibration dataset
Act-order, True-sequential, percdamp 0.1 (the default percdamp is 0.01)
No groupsize
Will run with CUDA, does not need triton.
Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

Benchmark results

Model	C4	WikiText2	PTB
Aeala's FP16	7.05504846572876	4.662261962890625	24.547462463378906
This Quant	7.326207160949707	4.957101345062256	24.941526412963867
Aeala's Quant here	7.332120418548584	5.016242980957031	25.576189041137695

Downloads last month: 21

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.