Lewdiculous/Nyanade_Stunna-Maid-7B-v0.2-GGUF-IQ-Imatrix

Lewdiculous

Owner Apr 15

Quants being prepared. You can link already if you want.

Lewdiculous pinned discussion Apr 15

Lewdiculous

Owner Apr 15

•

edited Apr 15

Did some reorganization of the ReadMe, let me know if anything is unclear or can be improved.

Nitral-AI

Apr 15

Looks good from my house! (albeit i usually am a lazy bastard with the description's)

Lewdiculous

Owner Apr 15

•

edited Apr 15

When I'm not feeling lazy, I should also add some VRAM => QUANT recommendations, but at this point I expect people to just figure out what is best for them.

Lewdiculous

Owner Apr 15

•

edited Apr 15

I tried:

A few general recommendations for quant options:

Assuming a context size of 8192 for simplicity and 1GB of Operating System VRAM overhead with some safety margin to avoid overflowing buffers.

For 11-12GB VRAM: A GPU with 12GB of VRAM capacity can comfortably use the Q6_K-imat quant option and run it at good speeds.
This is the same with or without using #vision capabilities.

For 8GB VRAM: If not using #vision, for GPUs with 8GB of VRAM capacity the Q5_K_M-imat quant option will fit comfortably and should run at good speeds.
If you are also using #vision from this model opt for the Q4_K_M-imat quant option to avoid filling the buffers and potential slowdow.

For 6GB VRAM: If not using #vision, for GPUs with 6GB of VRAM capacity the IQ3_M-imat quant option should fit comfortably to run at good speeds.
If you are also using #vision from this model opt for the IQ3_XXS-imat quant option.

Lewdiculous
/

Nyanade_Stunna-Maid-7B-v0.2-GGUF-IQ-Imatrix

General discussion.