More quantization variants

#1
by Yuma42 - opened

You should consider releasing more official quants because the unofficial ones I have found are worse than yours in my logic tests even if I compared quants of the same size. For me Q4S is interesting because of it's speed but other sizes might also have demand.z

When you say unofficial ones, does that include mine? I'd be highly curious if somehow my quants were worse than the ones provided by Nous

When you say unofficial ones, does that include mine? I'd be highly curious if somehow my quants were worse than the ones provided by Nous

Yes, there is a logic test which I have.
Nous Q4_K_M passes that (which is great because my own mistral based model fails it too xD ) but unfortunately your Q4_K_S and Q4_K_M both fail the test. I can share the prompt on request if you want.

And about the prompt format, I used the ChatML one which Nous is publishing, I could try the other one (I have seen your response on that but still need to read it).

Yeah please do share since i'm curious, and if it's repeatable i'd like to see if there's something I can improve

Yeah please do share since i'm curious, and if it's repeatable i'd like to see if there's something I can improve

Bob is faster than John.
John is faster than Erica.
No one older than Erica is faster than her.

Is Bob older than Erica?

Nous tries to answer step by step and often gets the correct answer. Yours doesn't go into the step by step mode and says that information is missing.

Interestingly neither of them got it correct in my testing but both attempted to do COT, they both tend to say the right answer but then immediately say that that info can't be determined which is.. interesting..

Either way, in my testing, these GGUFs and mine seem to perform identical πŸ€·β€β™‚οΈ

they both tend to say the right answer but then immediately say that that info can't be determined which is.. interesting..

Yeah the task is easy for humans but 7-8b models really struggle with it, I think because it breaks expectations (I made the test up so they haven't seen it anywhere).

Either way, in my testing, these GGUFs and mine seem to perform identical πŸ€·β€β™‚οΈ

That's good to know maybe it's the client which I'm using (I'm very limited at what I can run) but I'm not sure. I will test later with setting top k to 1 to have something more deterministic but I did run a lot of runs and the patterns were the same between them with nous either getting it correct or getting it nearly as you observed.

Edit: I did try again with top k = 1 and yes for me the Nous version can solve it while the other version can't.

Hi ! @Yuma42 and @bartowski

Just realized this models quants are from a different model than the one they claim , their page says this:

Quantized from
NousResearch/Hermes-2-Pro-Llama-3-8B

Maybe thats why you are getting different results

Hi ! @Yuma42 and @bartowski

Just realized this models quants are from a different model than the one they claim , their page says this:

Quantized from
NousResearch/Hermes-2-Pro-Llama-3-8B

Maybe thats why you are getting different results

I think the file name difference has other reasons at one place its theta at anther place its the theta symbol and at another place it includes "merge dpo" which is also what theta is supposed to be. I would guess they had that file name before they decided to name it theta.

Oh you are right, still it is a little bit confussing

@bartowski I just want to let you know that I myself also can't observe the behavior somehow πŸ˜… nous gets it wrong. Maybe I had an unlikely amount of very lucky runs? Guess I should switch to your versions back.

Sign up or log in to comment