Lack of 33B models?

by mancub - opened Aug 13, 2023

Aug 13, 2023

•

edited Aug 13, 2023

Hey hope all is well, I've been gone for a bit and I see you've been churning out an insane amount of models every day - yer nuts. :)

Anyway, wondering why is there a clear lack of 33B models coming out (in general)?

All I see now is either 70B which is unusable for most enthusiasts with 24GB VRAM and requires at least 48GB, or 7B/13B which are lower-end. The 33B seemed like a sweet spot that would fill the 24GB nicely and offer best of all worlds.

Sorry to post this into an unrelated model, but there's no contact point on HF to email you directly.

Thanks!

mirek190

Aug 13, 2023

ggml versions of 70b works great with 24GB cards :D

mancub

Aug 14, 2023

@mirek190

Which model are you talking about because they are all over 24GB in size?

Unless you have multiple 3090s or you do not load all of the layers, I don't see how would you be content with a 70B model at 5-6 t/s considering smaller models produce 50+ t/s

actionpace

Aug 14, 2023

•

edited Aug 14, 2023

I was thinking about that too. Does it mean you can put some layers on GPU and rely on the rest on multi core CPU? I would like the ability to experiment more on 30B+ models as well

mirek190

Aug 14, 2023

•

edited Aug 14, 2023

@mirek190

Which model are you talking about because they are all over 24GB in size?

Unless you have multiple 3090s or you do not load all of the layers, I don't see how would you be content with a 70B model at 5-6 t/s considering smaller models produce 50+ t/s

Of course ggml format I meant .
For instance q4km (mostly 15 bit weights) and 40 layers on GPU I have 2.5 t/s.

Something lower like 33B q4km ggml I have 30 t/s as all layers fit on rtx 3090.

mancub

Aug 14, 2023

But this is what I'm talking about - there aren't any 33B models out there lately, it's all either 13B or 70B.

IMHO 33B is the sweet spot for the 24GB VRAM cards (e.g. 3090) that most people could afford or find. The 48GB cards (A6000, etc) are at least 4-5x more expensive than a 3090.

nkpz

Aug 15, 2023

•

edited Aug 15, 2023

But this is what I'm talking about - there aren't any 33B models out there lately, it's all either 13B or 70B.

IMHO 33B is the sweet spot for the 24GB VRAM cards (e.g. 3090) that most people could afford or find. The 48GB cards (A6000, etc) are at least 4-5x more expensive than a 3090.

because they're llama 2 based, which is only 7b, 13b and 70b.

mancub

Aug 16, 2023

•

edited Aug 16, 2023

https://old.reddit.com/r/LocalLLaMA/comments/15mrrnm/do_you_guys_think_the_34b_llama_2_model_has_been/

https://github.com/facebookresearch/llama/issues/590

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment