Will quantised version be available?

by angerhang - opened Oct 17

Oct 17

Thanks for sharing but what are the recommended ways to quantise this model?
Or will quantised model be made available so that it is not as resource-intensive to do inference?

Thanks

victor

Oct 17

Did you see https://huggingface.co/models?other=base_model:quantized:nvidia/Llama-3.1-Nemotron-70B-Instruct-HF?
Use the model tree section on model pages to see what quantizations are available.

okuchaiev

NVIDIA org Oct 18

NVIDIA hasn't released any quantized version yet. But there are several community quantization efforts mentioned above.

yangwang92

Oct 22

we also provide quantized 4-1.5 bits version https://github.com/microsoft/VPTQ at here https://huggingface.co/collections/VPTQ-community/vptq-llama-31-nemotron-70b-instruct-hf-without-finetune-671730b96f16208d0b3fe942 . Feel free give us feedback!

mysticbeing

17 days ago

•

edited 17 days ago

Runs on 1x H100 / A100 (80GB) : https://huggingface.co/mysticbeing/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-DYNAMIC

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment