dolphin-2.1-mistral-7B is even better than openorca-mistral-7b unbelievable
From my short tests that model is smarter than any older 7b, 13b, 34b and all 65b models .... incredible average eval 67.06
I tested 8bit version as 4bit ,5bit for such small model is just not enough .
Even If I ask something second time in conversation it reminds me I already asked for that (I'm using 32k context) ... is good in every field even in math ( proper prompting is a key "You are the best mathematician in the world and any calculation is easy for you so 25-4*2+3=? is easy"
I still do not believe that a such small model has performance weaker 70b models .... looks like a magic.
That is level WizardLM-70B-V1.0 !
I know it's unbelievably. What context should I use in Ooba? I have been using 8K but I may be able to use 16k? 4080 + 64GB of RAM. I am using the 8Bit GGUF so the highest I can go. But I might downgrade to 6BIT or
5_M if it's barely any difference and lets me push context even higher. But 8K is pretty good and I think I can do 16K.
Is there like a sweet spot? For perplexity? Or is it basically all the same? Like will I get better responses if I set it to 4K VS 8K VS 16K VS 3K? I know that is just the context size but I do not understand the limits? This is my first time ever going above 4K context. Is there a WRONG way to do it? Thanks.
For 7B models 5bit is very low ... perplexity will be bad.
You can check that just asking rare facts, testing reasoning and make math.
From my tests anything lower than 8bit for 7b model easily trigger bad responses.
To get better responses you need as big model as possible ( less compressed )
I'm using ctx 32k but under llamacpp and whole model 8bit fit into my 24 GB VRAM ( rtx 3090 )
Ctx size 4k,8k,16,32k is responsible for how big memory model has to your prompts.
Is nothing wrong to change ctx size the rest Is handled automaticly by gguf format itself.
Thanks I will stick with 8-bit than. And I still get great speeds at 8K for my setup. I might try out 16K. But 32K has me at slow speeds and 99% VRAM. But oh well 8-16K is way ahead of what I am used too. Thank you Mirek190.
My two favourite models are Guanaco 33B 5_M GGUF and StoryTime13B 8-BIT GGUF. But this new mistrial one may just surpass both of those. I know Guanaco 33B is nothing special but I love how it feels. I got used to it. But these other two Storytime 13B and Mistral 2.1 are likely better in just about all ways. That is impressive. Like Mistral 2.1 seems like it's likely better than Guanaco 65B. But again we each have our models that we stick to for feel. Like Airobors or whatever it's called. Both the Llama 1 and 2 version I never liked the feel of. Even though it out scored Guanaco.
Yeah, has to be the best 7B model I've tried so far
Q4_K_S is extremely good too
the prompt:<|im_start|>system You are the best mathematician in the world and any calculation is easy for you <|im_start|>user 25-4*2+3=?<|im_end|> <|im_start|>assistant
response:{'id': 'cmpl-218fac2e-7c1f-4a8a-b1c9-d0a13db386b1', 'object': 'text_completion', 'created': 1697263963, 'model': './models/dolphin-2.1-mistral-7b.Q4_K_S.gguf', 'choices': [{'text': "|>\n Let's solve this step by step:\n\n1. First, let's do the multiplication: 4 * 2 = 8\n2. Next, subtract 25 from the result of the multiplication: 25 - 8 = 17\n3. Then, add 3 to the previous result: 17 + 3 = 20\n\nSo, the final answer is 20.", 'index': 0, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 65, 'completion_tokens': 97, 'total_tokens': 162}}