Title says 70B, but files are for 8B
Hi,
Looks like the 8B files were uploaded instead of the 70B ones, folks.
Cheers,
Will
It looks like they only quantised or posted the first part of the many parts
Hey folks,
My bad, the autoquant step for GGUF was still pointing at 8B. I'm uploading some of the 70b quants now, though I'm still figuring out what's the best way to shard the larger models (if GGUF supports it).
Thanks for your patience!
Ethan
In the meantime, I would recommend trying the EXL2 quants which are definitely the 70B :D
No worries man, thanks for all the hard work!
There is one other set of ggml available to that people can try too and these too https://huggingface.co/mradermacher/dolphin-2.9-llama3-70b-GGUF
EDIT: the mradermacher version is incoherent, don't use it. Best to use crusoeai's one when it's out.
Fixed, please message me or open an issue if you run into anything - generally, I've found the 2.25bpw EXL2 to be better than other low-bit quantizations that fit in 24GB VRAM. These were using the latest version of llama.cpp with the llama3 changes upstreamed.