Edit model card

Notes

There is no template, just BOS+text

It can also start from nothing

Temperature, repetition penalty, etc should all be left as defaults

It will not go lewd immediately, it will try to form a coherent story

It's best to generate 1~3 paragraphs at a time, it loses coherence if you try to make it generate the full context all at once

LLaMA-3-8B base

RoPEd to 16k context

Name Quant Size VRAM (With FA) VRAM (No FA)
llama-3-8b-lewd-stories-v6-16k.F16 F16 14.9 16.6 17.4
llama-3-8b-lewd-stories-v6-16k.Q8_0 Q8_0 8.0 10.1 10.5
llama-3-8b-lewd-stories-v6-16k.Q6_K Q6_K 6.1 8.4 9.2
llama-3-8b-lewd-stories-v6-16k.Q5_K_M Q5_K_M 5.3 7.6 8.1
llama-3-8b-lewd-stories-v6-16k.Q4_K_M Q4_K_M 4.6 6.9 7.8

Yi-1.5-9B-32K

Native 32k context

Name Quant Size VRAM (With FA) VRAM (No FA)
yi-lewd-stories-32k.F16 F16 16.4
yi-lewd-stories-32k.Q8_0 Q8_0 8.7
yi-lewd-stories-32k.Q6_K Q6_K 6.7
yi-lewd-stories-32k.Q5_K_M Q5_K_M 5.8
yi-lewd-stories-32k.Q4_K_M Q4_K_M 5.0

Mistral-7B-v0.3

Native 32k context

Name Quant Size VRAM (With FA) VRAM (No FA)
mistral-lewd-stories-32k.F16 F16 13.5
mistral-lewd-stories-32k.Q8_0 Q8_0 7.2
mistral-lewd-stories-32k.Q6_K Q6_K 5.5
mistral-lewd-stories-32k.Q5_K_M Q5_K_M 4.8
mistral-lewd-stories-32k.Q4_K_M Q4_K_M 4.0
Downloads last month
71
GGUF
Model size
8.03B params
Architecture
llama

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
Unable to determine this model's library. Check the docs .