Would love to try a quantized version!
If you, or anyone you can reach out to could quantize this model to gguf I'd be very happy!
oh wait i found this. is that the same as this one?
It's not the same model but could be good! There's been a lot of issues regarding GGUF and Llama 3 but I'll look into it if the model has ok scores on the Open LLM Leaderboard
Ok interesting. Thanks!
I second the need for GGUF quants.
I have taken an interest in this.
Damn, well i guess that means its going to have a gguf soon, Eric doesnt mess around! Thank you!!! Cant wait to try it!
this thing is smarter than Opus
working on uploading it, it'll be tomorrow.
Amazing, thanks @ehartford :)
what's the difference between this model and imi2's model? Merge config is the same.
Hope to see exl2 and eventually a 103b. I actually liked the 103s more than the 120s.
this thing is smarter than Opus
This is hype, I am itching to try this!
what's the difference between this model and imi2's model? Merge config is the same.
I believe its probably inspired by goliath, so imi2 probably used the same method as this repo.
Took a shot at GGUF. QuantFactory/Meta-Llama-3-120B-Instruct-GGUF
Let me know if this works as expected
Okay I can try it later today hopefully! Thanks!
Probably related to how terribly llama 3 handles being quantized
I tried a lot of these merged models and below Q4 they were not better. At least 3.75bpw or not worth it.
Cool! Thanks Eric!
Time to rent a runpod machine again... LOL
I think I can officially close this since there have been multiple quants from people. Thus solving the issue.
Probably related to how terribly llama 3 handles being quantized
Do you have any resources on hand relating to this? I've noticed this too and would like to dig deeper on why its happening.
Do you have any resources on hand relating to this? I've noticed this too and would like to dig deeper on why its happening.
I have seen multiple reddit threads talking about it on locallama. I believe it was also mentioned here on HF. As far as the details, I do not know. Sorry!
GGUF still has issues. Keeps cropping up. Hence I am wary to d/l 60gb+ of it. EXL2 didn't appear to suffer from this problem. Every time there are "quantization" issues the user is always running llama.cpp
Interesting, perhaps the issue lies with llama.cpp itself and not the model.