Any plan for 70b ?
Hello, do you plan to release 70b ?
I think, yes, because the model card says it and the 70b folder was renamed to:
Meta-Llama-3-70B-Instruct-GGUF-old
yup, just having trouble with the server that was running it, transferred several off but then it crashed and i need to get it back up
yup, just having trouble with the server that was running it, transferred several off but then it crashed and i need to get it back up
I think it would make sense to test perplexity of the models beforehand as allegedly there are issues with imatrix and I-Quants.
There's perplexity issues but there's absolutely no generation issues
Even the exl2 gets weirdly high 7+ PPL, but it runs great. I almost feel the instruct tune is SO sensitive to its prompt template that it goes off the rails if it doesn't have it.
I've found in using it, unlike other models that will only slightly misbehave when they don't have their template, this one will go absolutely nuts generating infinitely. That's likely not good for perplexity..
Either way PPL on wiki raw is a weak test of a model's performance, use it if you like it
As for the 70B version, getting close ! internet is being waaay too slow, been going all day long :') just a one-off though, won't be doing it this way going forward
Sorry it took so long! It's up now :)
https://huggingface.co/bartowski/Meta-Llama-3-70B-Instruct-GGUF
Thank you for the great work. Q5_K_M is the best i can use with my CPU/RAM, i think, imatrix could have benefits. For smaller models i use Q6 or even Q8.
These are with imatrix btw :)
Yes, this is, why i use them. :-)