fp16 (and maybe gguf)

by trollkotze - opened May 1

May 1

yo, why not upload fp16 so people can quant it in all formats?
also description of method would be nice, so other retards can repeat it. thx u, kind sir.

deleted

May 1

More information regarding the methodology would be very welcome if you have the time.

just1moremodel

May 1

70b next please

algorithm

May 1

Yes, many of us can't run exl2.

ooooh-nooooo

May 1

fp32, fp16 or gguf please

LoafyLemon

May 1

I believe this is the 6.0 BPW version judging by its size. Could we get 6.5 BPW version as well, or FP16 so we can quant it ourselves? Thanks for the model by the way!

Henk717

May 1

FP16 is all we truly need, I understand requanting takes a lot of compute. If the community has FP16 the usual quantizers can take it from there.
Allows the model to run on hardware other than the latest GPU's and allows proper quanting to future formats so that the model isn't lost with time.

Gustavoson

May 5

https://huggingface.co/wassname/meta-llama-3-8b-instruct-helpfull

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment