Lack of 33B models?
Hey hope all is well, I've been gone for a bit and I see you've been churning out an insane amount of models every day - yer nuts. :)
Anyway, wondering why is there a clear lack of 33B models coming out (in general)?
All I see now is either 70B which is unusable for most enthusiasts with 24GB VRAM and requires at least 48GB, or 7B/13B which are lower-end. The 33B seemed like a sweet spot that would fill the 24GB nicely and offer best of all worlds.
Sorry to post this into an unrelated model, but there's no contact point on HF to email you directly.
Thanks!
ggml versions of 70b works great with 24GB cards :D
I was thinking about that too. Does it mean you can put some layers on GPU and rely on the rest on multi core CPU? I would like the ability to experiment more on 30B+ models as well
Which model are you talking about because they are all over 24GB in size?
Unless you have multiple 3090s or you do not load all of the layers, I don't see how would you be content with a 70B model at 5-6 t/s considering smaller models produce 50+ t/s
Of course ggml format I meant .
For instance q4km (mostly 15 bit weights) and 40 layers on GPU I have 2.5 t/s.
Something lower like 33B q4km ggml I have 30 t/s as all layers fit on rtx 3090.
But this is what I'm talking about - there aren't any 33B models out there lately, it's all either 13B or 70B.
IMHO 33B is the sweet spot for the 24GB VRAM cards (e.g. 3090) that most people could afford or find. The 48GB cards (A6000, etc) are at least 4-5x more expensive than a 3090.
But this is what I'm talking about - there aren't any 33B models out there lately, it's all either 13B or 70B.
IMHO 33B is the sweet spot for the 24GB VRAM cards (e.g. 3090) that most people could afford or find. The 48GB cards (A6000, etc) are at least 4-5x more expensive than a 3090.
because they're llama 2 based, which is only 7b, 13b and 70b.