Spaces:
Running
on
A10G
Running
on
A10G
Add F16 and BF16 quantization
#129
by
andito
HF staff
- opened
No description provided.
The problem with adding BF16
is that current we use convert_hf_to_gguf.py
to convert HF model into F16, then use llama-quantize
to quantize it.
So the conversion will be safetensors --> F16 --> BF16 which adds no benefit to the output model.
What we should do here is also modify the code that run convert_hf_to_gguf.py
, so it outputs directly BF16 GGUF file