InfinityKumon-2x7B

InfinityKumon-2x7B

GGUF - Imatrix quant of InfinityKumon-2x7B

Another MoE merge from Endevor/InfinityRP-v1-7B and grimjim/kukulemon-7B.

The reason? Because I like InfinityRP-v1-7B so much and wondering if I can improve it even more by merging 2 great models into MoE.

Perplexity

Using llama.cpp/perplexity with private roleplay dataset.

Format PPL
FP16 3.1748 +/- 0.11928
Q8_0 3.1734 +/- 0.11935
Q6_K 3.1752 +/- 0.11899
Q5_K_M 3.1731 +/- 0.11892
IQ4_NL 3.1752 +/- 0.11943
IQ3_M 3.1773 +/- 0.11528
Q2_K 3.2309 +/- 0.11996

I don't really recomend using Q2_K based on the ppl, the other quants are fine.

Prompt format:

Alpaca or ChatML

Switch: FP16 - GGUF

Downloads last month
24
GGUF
Model size
12.9B params
Architecture
llama

2-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .