Create GGUF for this please

by ishanparihar - opened Mar 4

Discussion

ishanparihar

Mar 4

Create GGUF for this please.

MaziyarPanahi

Mar 6

I am uploading them here: https://huggingface.co/MaziyarPanahi/MixTAO-7Bx2-MoE-v8.1-GGUF

xldistance

Mar 8

@LoneStriker Can you quantify the exl2 model?

LoneStriker

Mar 8

quants here:
https://huggingface.co/models?search=LoneStriker/MixTAO-7Bx2-MoE-v8.1

zhengr

MixTAO Labs org Mar 11

Only Q4_K_M in https://huggingface.co/zhengr/MixTAO-7Bx2-MoE-v8.1-GGUF

https://huggingface.co/zhengr/MixTAO-7Bx2-MoE-v8.1-GGUF/resolve/main/mixtao-7bx2-moe-v8.1.Q4_K_M.gguf

ishanparihar

Apr 15

@MaziyarPanahi @LoneStriker
Can you confirm the funtioning of these quants at your end?

Whenever I tested them, Q6_k and Q8, they spoke all gibberish.

Only the Q4k_m by original author works at the moment for me.

Thanks for your efforts to the open source community. 💖

MaziyarPanahi

Apr 15

@MaziyarPanahi @LoneStriker
Can you confirm the funtioning of these quants at your end?

Whenever I tested them, Q6_k and Q8, they spoke all gibberish.

Only the Q4k_m by original author works at the moment for me.

Thanks for your efforts to the open source community. 💖

Hi @ishanparihar
There are a lot of new changes introduced recently to MoE models by Llama.cpp. I think it's best if I do a new one with a new build and just test this again

ishanparihar

Apr 15

Thanks @MaziyarPanahi for your prompt response and your intention to serve. Looking forward to the new builds. ⭐

ZeroWw

Jun 24

My own (ZeroWw) quantizations. output and embed tensors quantized to f16. all other tensors quantized to q5_k or q6_k.

Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.

https://huggingface.co/ZeroWw/MixTAO-7Bx2-MoE-v8.1-GGUF

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment