|
--- |
|
license: cc-by-nc-4.0 |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- HQQ |
|
- mixtral |
|
- moe |
|
- quantized |
|
- 2bit |
|
|
|
--- |
|
|
|
## NeverSleep's [Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) 2 bit HQQ quant. |
|
## 18.2 GB |
|
### the other 14 shannons will be remembered. [HQQ quantized](https://mobiusml.github.io/hqq_blog/) to 2 bits with 4 bit attention. Fits on a 3090 with room to grow. Supports full 32k context. I will not combine those assertions. |
|
The attention tensors are 4 bit because mixtral reuses it for each expert - so it's only adding 0.4 GB and the quality improve dramatically. See [this](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) but horny and dying of chatml m<|alig>|nant tokenitis.|> |
|
|
|
### This is a 2+4 bit quantization of noromixmaidblah (just scroll down) using an emerging and aparrently very robust quantization method Half-Quadratic Quantisation. It ultimately squeezes it's tokens out of HF Transformers, not one ofthe *lesser* inference tools. So what's juicy about this is that it *functions* with full Transformers sampler and tokeniser support but you only need a 3090 instead of a H100! Truly emancipatory. |
|
|
|
...I'll do something smaller next time. |
|
|
|
My unwitting and presumably unwilling collaborators were the very clever people at [mobiusml - see their freaky maths at their github blog mini paper thing for HQQ](https://github.com/mobiusml/hqq). It's compatible with HF Transformers (including contrastive search baybee!) and is supported out of the box (I think) on text-generation-webui. |
|
For mobius's own description of what this is, see the template I followed, their quantization of a vanilla mixtral at [mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) |
|
|
|
+ my best guess at parsing the HQQ source is that it works by sort of... 'JIT de-quanti-'' I have no idea, really. If you prefer talking to human beings from being lied to by language models (why are you here?) you could probably ask the MobiusML - they seem friendly and compsci/engineer types tend to enjoy talking about their research and development. Weirdos. |
|
|
|
|
|
I *think* this is a functioning quant from one of everone's favorite norovirus inspired language models, Noromaid. I wouldn't know - I can't load 90 gigabytes of BF16 so this is my first few minutes too. |
|
|
|
#### see my oom-killer nightmare log. (my struggle with baby's first quant) in the [other markdown file.](MISLEAD.md/) |
|
note 5 mar 2024: oh yeah if this is happening there's a flag in the transformers function its calling to just make it stop? idk why it wasn't on by default. |
|
But even if you do want to know what I've learned - you're better off just asking me than trying to parse *that*. |
|
Just read the original card please: |
|
|
|
--- |
|
# Original README from the Neversleep twins follows: |
|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630dfb008df86f1e5becadc3/vwcJfOnL-2QDJ0ShfxRJ5.png) |
|
|
|
|
|
|
|
--- |
|
|
|
# Disclaimer: |
|
## This model is experimental, do not expect everything to work. |
|
|
|
This model uses the Chatml **prompting format** |
|
|
|
--- |
|
|
|
|
|
Beeg noromaid on ***steroids***. Suitable for RP, ERP. |
|
|
|
This model was trained on the Zloss fork of Charles, and should fix issue the model had. |
|
|
|
Use Chatml prompt format, but not the special token. |
|
|
|
The reason is that Axolotl merge the finetune with the base model at 1.0 weight basically, but this is too much, so I use another script available [HERE](https://github.com/DocShotgun/LLM-notebooks/blob/main/weighted-lora-merge.ipynb) to merge with less weight, sadly, it don't take the special Chatml token. It's like Orca2 for the matter. |
|
|
|
|
|
## Credits: |
|
- Undi |
|
- IkariDev |
|
|
|
<!-- description start --> |
|
## Description |
|
|
|
<!-- [Recommended settings - contributed by localfultonextractor](https://files.catbox.moe/ue0tja.json) --> |
|
|
|
This repo contains FP16 files of Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss. |
|
|
|
[FP16 - by IkariDev and Undi](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) |
|
|
|
<!-- [GGUF - By TheBloke](https://huggingface.co/TheBloke/Athena-v4-GGUF)--> |
|
|
|
<!-- [GPTQ - By TheBloke](https://huggingface.co/TheBloke/Athena-v4-GPTQ)--> |
|
|
|
<!-- [exl2[8bpw-8h] - by AzureBlack](https://huggingface.co/AzureBlack/Echidna-13b-v0.3-8bpw-8h-exl2)--> |
|
|
|
<!-- [AWQ - By TheBloke](https://huggingface.co/TheBloke/Athena-v4-AWQ)--> |
|
|
|
<!-- [fp16 - by IkariDev+Undi95](https://huggingface.co/IkariDev/Athena-v4)--> |
|
|
|
[GGUF - by IkariDev and Undi](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF) |
|
<!-- [OLD(GGUF - by IkariDev+Undi95)](https://huggingface.co/IkariDev/Athena-v4-GGUF)--> |
|
|
|
## Ratings: |
|
|
|
Note: We have permission of all users to upload their ratings, we DONT screenshot random reviews without asking if we can put them here! |
|
|
|
No ratings yet! |
|
|
|
If you want your rating to be here, send us a message over on DC and we'll put up a screenshot of it here. DC name is "ikaridev" and "undi". |
|
|
|
<!-- description end --> |
|
<!-- prompt-template start --> |
|
### Prompt format: Chatml |
|
``` |
|
<|im_start|>system |
|
{sysprompt}<|im_end|> |
|
<|im_start|>user |
|
{input}<|im_end|> |
|
<|im_start|>assistant |
|
{output}<|im_end|> |
|
``` |
|
|
|
## Datasets used: |
|
|
|
- Aesir 1, 2 & 3 modified by us, credit to ([MinervaAI](https://huggingface.co/MinervaAI) / [Gryphe](https://huggingface.co/Gryphe)) |
|
- [LimaRP-20231109](https://huggingface.co/datasets/lemonilia/LimaRP) ([Lemonilia](https://huggingface.co/lemonilia)) |
|
- [ToxicQAFinal](https://huggingface.co/datasets/NobodyExistsOnTheInternet/ToxicQAFinal) ([NobodyExistsOnTheInternet](https://huggingface.co/NobodyExistsOnTheInternet) |
|
- [No-robots-ShareGPT](https://huggingface.co/datasets/Doctor-Shotgun/no-robots-sharegpt) ([Doctor-Shotgun](https://huggingface.co/Doctor-Shotgun)) |
|
|
|
|
|
## Others |
|
|
|
Undi: If you want to support me, you can [here](https://ko-fi.com/undiai). |
|
|
|
IkariDev: Visit my [retro/neocities style website](https://ikaridevgit.github.io/) please kek |