Doctor-Shotgun
/

Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss

@@ -15,10 +15,31 @@ datasets:
 Experimental model, using a limarp qlora trained at 10k ctx length (greater than size of the longest limarp sample when tokenized via mistral's tokenizer) on [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) using [Charles Goddard](https://huggingface.co/chargoddard)'s ZLoss and Megablocks-based fork of transformers, and then fused to [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) at 0.5 weight.
-Would try with temp ~1.5-2 and min-p of ~0.03-0.05 since mixtral does appear to be highly confident on its responses and can enter repetition loops after several thousand tokens of responses.
 [Peft Adapter](https://huggingface.co/Doctor-Shotgun/limarp-zloss-mixtral-8x7b-qlora)
 ## Usage:
 The intended prompt format is the Alpaca instruction format of LimaRP v3:
 ```
@@ -45,6 +66,7 @@ Character: {utterance}
 (etc.)
 ```
 ## Message length control
 Due to the inclusion of LimaRP v3, it is possible to append a length modifier to the response instruction sequence, like this:

 Experimental model, using a limarp qlora trained at 10k ctx length (greater than size of the longest limarp sample when tokenized via mistral's tokenizer) on [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) using [Charles Goddard](https://huggingface.co/chargoddard)'s ZLoss and Megablocks-based fork of transformers, and then fused to [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) at 0.5 weight.
+My current generation settings are:
+```
+Temperature: 1.25
+Min-p: 0.05
+Repetition penalty: 1.05
+Repetition penalty: range 1024
+```
+And this seems to avoid the Mixtral looping pitfalls for me so far. Play around with it and see what works well for you.
 [Peft Adapter](https://huggingface.co/Doctor-Shotgun/limarp-zloss-mixtral-8x7b-qlora)
+Quants courtesy of TheBloke:
+- [GPTQ](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-GPTQ)
+- [GGUF](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-GGUF)
+- [AWQ](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-AWQ)
+Exl2 Quants courtesy of LoneStriker:
+- [2.4bpw](https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-2.4bpw-h6-exl2)
+- [3.0bpw](https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-3.0bpw-h6-exl2)
+- [3.5bpw](https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-3.5bpw-h6-exl2)
+- [3.75bpw](https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-3.75bpw-h6-exl2)
+- [4.0bpw](https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-4.0bpw-h6-exl2)
+- [5.0bpw](https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-5.0bpw-h6-exl2)
+- [6.0bpw](https://huggingface.co/LoneStriker/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2)
 ## Usage:
 The intended prompt format is the Alpaca instruction format of LimaRP v3:
 ```
 (etc.)
 ```
+My current templates have been uploaded to a [folder](https://huggingface.co/Doctor-Shotgun/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss/tree/main/ST%20Templates).
 ## Message length control
 Due to the inclusion of LimaRP v3, it is possible to append a length modifier to the response instruction sequence, like this: