metadata

inference: false
license: other
language:
  - en
tags:
  - llama
  - alpaca
  - vicuna
  - mix
  - merge
  - model merge
  - roleplay
  - chat
  - instruct

Want to contribute? TheBloke's Patreon page

digitous' 13B HyperMantis GGML

These files are GGML format model files for digitous' 13B HyperMantis.

GGML files are for CPU + GPU inference using llama.cpp and libraries and UIs which support this format, such as:

Repositories available

THE FILES IN MAIN BRANCH REQUIRES LATEST LLAMA.CPP (May 19th 2023 - commit 2d5db48)!

llama.cpp recently made another breaking change to its quantisation methods - https://github.com/ggerganov/llama.cpp/pull/1508

I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 19th or later (commit 2d5db48 or later) to use them.

Provided files

Name	Quant method	Bits	Size	Max RAM required	Use case
13B-HyperMantis.ggmlv3.q4_0.bin	q4_0	4	7.32 GB	9.82 GB	4-bit.
13B-HyperMantis.ggmlv3.q4_1.bin	q4_1	4	8.14 GB	10.64 GB	4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.
13B-HyperMantis.ggmlv3.q5_0.bin	q5_0	5	8.95 GB	11.45 GB	5-bit. Higher accuracy, higher resource usage and slower inference.
13B-HyperMantis.ggmlv3.q5_1.bin	q5_1	5	9.76 GB	12.26 GB	5-bit. Even higher accuracy, resource usage and slower inference.
13B-HyperMantis.ggmlv3.q8_0.bin	q8_0	8	13.83 GB	16.33 GB	8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use.

Note: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.

How to run in `llama.cpp`

I use the following command line; adjust for your tastes and needs:

./main -t 10 -ngl 32 -m 13B-HyperMantis.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

Change -t 10 to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use -t 8.

Change -ngl 32 to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.

If you want to have a chat-style conversation, replace the -p <PROMPT> argument with -i -ins

How to run in `text-generation-webui`

Further instructions here: text-generation-webui/docs/llama.cpp-models.md.

Discord

For further support, and discussions on these models and AI in general, join us at:

TheBloke AI's Discord server

Thanks, and how to contribute.

Thanks to the chirper.ai team!

I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.

If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.

Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

Patreon special mentions: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.

Thank you to all my generous patrons and donaters!

Original model card: digitous' 13B HyperMantis

13B-HyperMantis

is a weight-sum multi model-merge comprised of:

((MantiCore3E+VicunaCocktail)+(SuperCOT+(StorytellingV2+BluemoonRP))) [All 13B Models]

(GGML and GPTQ are no longer in this repo and will be migrated to a separate repo for easier git download convenience)

Subjective testing shows quality results with KoboldAI (similar results are likely in Text Generation Webui, please disregard KAI-centric settings for that platform); Godlike preset with these tweaks - 2048 context, 800 Output Length, 1.3 Temp, 1.13 Repetition Penalty, AltTextGen:On, AltRepPen:Off, No Prompt Gen:On

Despite being primarily uncensored Vicuna models at its core, HyperMantis seems to respond best to the Alpaca instruct format. Speculatively due to manticore's eclectic instruct datasets generalizing the model's understanding of following instruct formats to some degree. What is known is HyperMantis responds best to the formality of Alpaca's format, whereas Human/Assistant appears to trigger vestigial traces of moralizing and servitude that aren't conducive for roleplay or freeform instructions.

Here is an example of what to place in KAI's Memory (or TGUI's equivalent) to leverage chat as a Roleplay Adventure. [Define what the role of the named Human/AI are here, let's say our name is 'Player' and we named the AI 'Narrator']

Game Mode:Chat [Remember to name yourself and the AI and reference them in the instruction block]

### Instruction:

Make Narrator perform as a text based adventure game with Player as Narrator's user input. Make Narrator describe the scene, scenario, actions of characters, reactions of characters to the player's actions, and potential consequences of their actions and Player's actions when relevant with visually descriptive, detailed, and long storytelling. Allow characters and Player to converse to immerse Player in a rich narrative driven story. When Player encounters a new character, Narrator will name the new character and describe their behavior and appearance. Narrator will internally determine their underlying motivations and weave it into the story where possible.

### Response: [Put A Carriage Return Here]

In KAI, this is why 'No Prompt Gen:On' is important; make your first entry a short writeup of your current situation, or simply reiterate Narrator is a text adventure game and Player is the input. Then your next entry, despite simply being a chat interface, it will kick off what will happen next for Narrator to riff off of. In TGUI, an equivalent setup works the same. Of course, tailor this to whatever you want it to be; instruct models can be as versatile as your imagination. If things go sideways have fun.

Possibly also useful as a regular chatbot, waifu, husbando, TavernAI character, freeform instruct shenanigans, it's whatever. 4bit-128g safetensor [Cuda] included for convenience, might do ggml. Mileage may vary, warranty void if the void stares back.

Credits:

manticore-13b [Epoch3] by openaccess-ai-collective

https://huggingface.co/openaccess-ai-collective/manticore-13b

vicuna-13b-cocktail by reeducator

https://huggingface.co/reeducator/vicuna-13b-cocktail

SuperCOT-LoRA [13B] by kaiokendev

https://huggingface.co/kaiokendev/SuperCOT-LoRA

Storytelling-LLaMa-LoRA [13B, Version 2] by GamerUnTouch

https://huggingface.co/GamerUntouch/Storytelling-LLaMa-LoRAs

bluemoonrp-13b by reeducator

https://huggingface.co/reeducator/bluemoonrp-13b

"Such as gravity's rainbow, sufficiently complex systems stir emergent behavior near imperceptible, uncanny; a Schrodinger's puzzlebox of what may be intrinsic or agentic. Best not to startle what black box phantoms there may be."