Add F16 and BF16 quantization

#129
by andito HF staff - opened
No description provided.
ggml.ai org

The problem with adding BF16 is that current we use convert_hf_to_gguf.py to convert HF model into F16, then use llama-quantize to quantize it.

So the conversion will be safetensors --> F16 --> BF16 which adds no benefit to the output model.

What we should do here is also modify the code that run convert_hf_to_gguf.py, so it outputs directly BF16 GGUF file

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment