TheBloke/WizardLM-13B-V1-1-SuperHOT-8K-GGML

Jul 7, 2023

•

edited Jul 7, 2023

I think in koboldcpp you can use 4096 or maybe even more on models that aren't superhot. Have you heard about whether or not there was a performance/quality impact for using either versions over the other for higher than 2k?

TheBloke

Owner Jul 7, 2023

I haven't heard of specific testing but yes I would expect a quality drop. The benefit of the SuperHOT models is that they gain training on responses that are over 2048 in length. Without that, quality is expected to degrade on longer responses.

It may well still be usable though, so it's worth playing with.

Maggiet

Jul 9, 2023

•

edited Jul 9, 2023

I think in koboldcpp you can use 4096 or maybe even more on models that aren't superhot. Have you heard about whether or not there was a performance/quality impact for using either versions over the other for higher than 2k?

I have tried this just before the superhots were released. The answer is no. It "works" in that it's not throwing error exceptions, but it will give you gibberish, mostly punctuation or repeat the same a few words, as if it were autistic. 2k is the max for those.
The superhots work. 8k even. But some models seem to behave better than other, and obviously the higher "b" count, the better the results.

TheBloke

Owner Jul 9, 2023

•

edited Jul 9, 2023

I have tried this just before the superhots were released. The answer is no. It "works" in that it's not throwing error exceptions, but it will give you gibberish, mostly punctuation or repeat the same a few words, as if it were autistic. 2k is the max for those.
The superhots work. 8k even. But some models seem to behave better than other, and obviously the higher "b" count, the better the results.

Ah, but if you tested before the SuperHOTs came out then I think you only tested by adjusting the UI setting, right? You didn't also have --contextsize 4096 on the command line, which is what applies the magic algorithm that should in theory work with any model.

That algorithm was only added after SuperHOT, because it was Kaio Ken's work on SuperHOT that showed everyone how to increase context with a very simple change to the code.

The SuperHOT models then also feature training on >2K prompts to boost its ability to answer those, but in theory it should also work without that, albeit probably not as well.

Maggiet

Jul 9, 2023

•

edited Jul 10, 2023

I have tried this just before the superhots were released. The answer is no. It "works" in that it's not throwing error exceptions, but it will give you gibberish, mostly punctuation or repeat the same a few words, as if it were autistic. 2k is the max for those.
The superhots work. 8k even. But some models seem to behave better than other, and obviously the higher "b" count, the better the results.

Ah, but if you tested before the SuperHOTs came out then I think you only tested by adjusting the UI setting, right? You didn't also have --contextsize 4096 on the command line, which is what applies the magic algorithm that should in theory work with any model.

That algorithm was only added after SuperHOT, because it was Kaio Ken's work on SuperHOT that showed everyone how to increase context with a very simple change to the code.

The SuperHOT models then also feature training on >2K prompts to boost its ability to answer those, but in theory it should also work without that, albeit probably not as well.

I'm not a 100% certain, but I have batch file launchers with contextsize params. If you mean the models released within a window of a few days saying they were 4K capable, then those did work. But the ordinary 2K ones prior don't.

contextsize params were added for KoboldAI Lite when your readme's called for it. Oogabooga parameters were used before the readme's.

Maggiet

Jul 10, 2023

•

edited Jul 10, 2023

So after some tests, and as TheBloke said, it would seem that the changes within the software itself allows for larger contexts. And so the older models do work on higher context. minotaur-15b.ggmlv3.q4_0.bin crashes due to memory access violation, even if I have the space. Most have worked. There are problems with knowing which format the model follows (Alpaca, Vicuna, WizardLM, etc). I say this is a problem because there are no template functionality for KoboldAI as there is with Oobagooba, and so one must go to the settings, remember which format is to be used, and then enter the format manually. At least with Oobagooba one can quickly change templates when the incorrect format was used. I mostly cpu render, so results are EXTREMELY slow.

TheBloke

Owner Jul 10, 2023

Great, thanks for letting us know. Good to hear it's working for most of them.

And yes I agree that the KoboldCpp UI trails behind text-generation-webui in areas like this. I've been told it was primarily designed as a story telling UI, so things like instruct template haven't received much attention. I really dislike how small those settings text boxes are, for one thing :) Not even big enough to display ### Instruction: properly.

Hopefully it'll improve over time.

Maggiet

Jul 10, 2023

I was running 1.33 KoboldAI. I'll seeing that there is an update with improvements.

TheBloke
/

WizardLM-13B-V1-1-SuperHOT-8K-GGML

superhot vs normal