Title says 70B, but files are for 8B

#1
by FiditeNemini - opened

Hi,

Looks like the 8B files were uploaded instead of the 70B ones, folks.

Cheers,
Will

It looks like they only quantised or posted the first part of the many parts

Crusoe AI org

Hey folks,

My bad, the autoquant step for GGUF was still pointing at 8B. I'm uploading some of the 70b quants now, though I'm still figuring out what's the best way to shard the larger models (if GGUF supports it).

Thanks for your patience!
Ethan

Crusoe AI org

In the meantime, I would recommend trying the EXL2 quants which are definitely the 70B :D

No worries man, thanks for all the hard work!

There is one other set of ggml available to that people can try too and these too https://huggingface.co/mradermacher/dolphin-2.9-llama3-70b-GGUF
EDIT: the mradermacher version is incoherent, don't use it. Best to use crusoeai's one when it's out.

Crusoe AI org

Fixed, please message me or open an issue if you run into anything - generally, I've found the 2.25bpw EXL2 to be better than other low-bit quantizations that fit in 24GB VRAM. These were using the latest version of llama.cpp with the llama3 changes upstreamed.

3thn changed discussion status to closed

Sign up or log in to comment