bartowski/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF

22 days ago

Is there much of a difference in terms of perplexity?

Trying to decide between Macbook Pro M4 Max 128GB and the 64GB variation. I would only have to sell one kidney if I were able to run the Q5_K_L on the 64GB version. Unsure if the Q6_K and 57GB file size would slow the 64GB machine to a crawl.

bartowski

Owner 21 days ago

Q5_K_L should be more than enough for any use case, but I can run some PPL numbers for you on this specific model if you'd like.

Wikitext or would you rather a specific dataset?

AIGUYCONTENT

18 days ago

Heya, apologies for the late reply.

If I were to run the Q4_K_L quant instead (40GB file size and uses Q8_0 for embed and output weights)...how much of a difference would there be between that and the Q5_K_L? I'm now considering getting the M4 Macbook Pro with 48GB.

However, if I do, the Q5_K_L is 50GB in file size and won't work. Whereas it should fit if I were to get the Macbook Pro M4 Max 64GB (which is a lot more expensive).

I use this for writing purposes (work) and need the model to be intelligent.

To answer your question...I have no idea what dataset to request. I suppose whichever one is best for professional writing (website content/blog posts/etc).

Thanks!

shipWr3ck

16 days ago

•

edited 16 days ago

Hi I have the same question as I'm trying to use GGUF to do some model evaluation. I want to load the model into vllm and I'm wondering whether evaluation done on Llama-3.1-Nemotron-70B-Instruct-HF-Q5_K_S.gguf can be an accurate enough reflection of model ability? Thank you!

bartowski
/

Llama-3.1-Nemotron-70B-Instruct-HF-GGUF

Q6_K vs. Q5_K_L