Q6_K vs. Q5_K_L
Is there much of a difference in terms of perplexity?
Trying to decide between Macbook Pro M4 Max 128GB and the 64GB variation. I would only have to sell one kidney if I were able to run the Q5_K_L on the 64GB version. Unsure if the Q6_K and 57GB file size would slow the 64GB machine to a crawl.
Q5_K_L should be more than enough for any use case, but I can run some PPL numbers for you on this specific model if you'd like.
Wikitext or would you rather a specific dataset?
Heya, apologies for the late reply.
If I were to run the Q4_K_L quant instead (40GB file size and uses Q8_0 for embed and output weights)...how much of a difference would there be between that and the Q5_K_L? I'm now considering getting the M4 Macbook Pro with 48GB.
However, if I do, the Q5_K_L is 50GB in file size and won't work. Whereas it should fit if I were to get the Macbook Pro M4 Max 64GB (which is a lot more expensive).
I use this for writing purposes (work) and need the model to be intelligent.
To answer your question...I have no idea what dataset to request. I suppose whichever one is best for professional writing (website content/blog posts/etc).
Thanks!
Hi I have the same question as I'm trying to use GGUF to do some model evaluation. I want to load the model into vllm and I'm wondering whether evaluation done on Llama-3.1-Nemotron-70B-Instruct-HF-Q5_K_S.gguf can be an accurate enough reflection of model ability? Thank you!