GPTQ 4Bit Llama 3.2-1B-Instruct with 100% Accuracy recovery

#38

by Qubitium - opened 23 days ago

23 days ago

I am happy to announce that users that want even faster inference of Llama 3.2 1B Instruct with even lower vram requirements can now production deploy via vLLM/SGLang using our highly accurate gptq 4bit quantized model

https://x.com/ModelCloudAi/status/1852249758913724752
https://huggingface.co/ModelCloud/Llama-3.2-1B-Instruct-gptqmodel-4bit-vortext-v2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment