Must be a quant
The size of this model is the same size as the 4 bit quantization. Are you sure this is not a 4 bit quantization that's just being "expanded" to bf16 at runtime? The size of the pretrained model mlx-community/Meta-Llama-3.1-70B-bf16
is several times larger than this instruct model.
Config file says this:
"quantization": {
"group_size": 64,
"bits": 4
},..
It appears to be a 4 bit quantized model.
If I wanted to upload a proper bf16 MLX model of this to replace this one, how would I go about that? I know I could make my own model repo, but how do I go about getting the permission to update this incorrect one?
Not sure, but you can probably can join the community (https://huggingface.co/mlx-community) and upload the correct bf16 model ( + New) with a slightly different name indicating that it is the correct one.
Fixed ✅