mradermacher/Llama-2-70b-x8-MoE-clown-truck-i1-GGUF

About

No IQ1_S is available because llama.cpp is a crashfest atm. and crashes when trying to generate it.

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Link	Type	Size/GB	Notes
PART 1 PART 2 PART 3 PART 4	i1-IQ2_M	153.4
PART 1 PART 2 PART 3 PART 4	i1-Q2_K	170.8	IQ3_XXS probably better
PART 1 PART 2 PART 3 PART 4	i1-IQ3_XXS	180.2	lower quality
PART 1 PART 2 PART 3 PART 4	i1-IQ3_XS	190.5
P1 P2 P3 P4 P5	i1-IQ3_S	202.0	beats Q3_K*
P1 P2 P3 P4 P5	i1-Q3_K_S	202.0	IQ3_XS probably better
P1 P2 P3 P4 P5	i1-IQ3_M	211.9
P1 P2 P3 P4 P5	i1-Q3_K_M	223.1	IQ3_S probably better
P1 P2 P3 P4 P5	i1-Q3_K_L	239.3	IQ3_M probably better
P1 P2 P3 P4 P5 P6	i1-IQ4_XS	248.3
P1 P2 P3 P4 P5 P6	i1-Q4_K_S	264.9	optimal size/speed/quality
P1 P2 P3 P4 P5 P6	i1-Q4_K_M	282.0	fast, recommended
P1 P2 P3 P4 P5 P6 P7	i1-Q5_K_S	319.7
P1 P2 P3 P4 P5 P6 P7	i1-Q5_K_M	329.7
P1 P2 P3 P4 P5 P6 P7 P8	i1-Q6_K	381.0	practically like static Q6_K

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time.