Title says 70B, but files are for 8B

by FiditeNemini - opened Apr 25

Discussion

FiditeNemini

Apr 25

Hi,

Looks like the 8B files were uploaded instead of the 70B ones, folks.

Cheers,
Will

wassname

Apr 25

It looks like they only quantised or posted the first part of the many parts

3thn

Crusoe AI org Apr 25

Hey folks,

My bad, the autoquant step for GGUF was still pointing at 8B. I'm uploading some of the 70b quants now, though I'm still figuring out what's the best way to shard the larger models (if GGUF supports it).

Thanks for your patience!
Ethan

3thn

Crusoe AI org Apr 25

In the meantime, I would recommend trying the EXL2 quants which are definitely the 70B :D

wassname

Apr 25

•

edited Apr 25

No worries man, thanks for all the hard work!

There is one other set of ggml available to that people can try too and these too https://huggingface.co/mradermacher/dolphin-2.9-llama3-70b-GGUF
EDIT: the mradermacher version is incoherent, don't use it. Best to use crusoeai's one when it's out.

3thn

Crusoe AI org Apr 25

Fixed, please message me or open an issue if you run into anything - generally, I've found the 2.25bpw EXL2 to be better than other low-bit quantizations that fit in 24GB VRAM. These were using the latest version of llama.cpp with the llama3 changes upstreamed.

3thn changed discussion status to closed Apr 25

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment