SD3.5-Large-GGUF-mixed-sdcpp / README.md

Update README.md

4ab0d85 verified 5 days ago

5.06 kB

	---
	license: other
	license_name: sacla
	license_link: >-
	https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md
	base_model:
	- stabilityai/stable-diffusion-3.5-large
	base_model_relation: quantized
	---
	## Overview
	These models are made to work with [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) release [master-ac54e00](https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-ac54e00) onwards. Support for other inference backends is not guarenteed.

	Quantized using this PR https://github.com/leejet/stable-diffusion.cpp/pull/447

	Normal K-quants are not working properly with SD3.5-Large models because around 90% of the weights are in tensors whose shape doesn't match the 256 superblock size of K-quants and therefore can't be quantized this way.
	Mixing quantization types allows us to take adventage of the better fidelity of k-quants to some extent while keeping the model file size relatively small.

	Only the second layers of both MLPs in each MMDiT block of SD3.5 Large models have the correct shape to be compatible with k-quants. That still makes up for about 10% of all the parameters.

	## Files:

	### Mixed Types:


	- [sd3.5_large-q2_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-GGUF-mixed-sdcpp/blob/main/sd3.5_large-q2_k_4_0.gguf): Smallest quantization yet. Use this if you can't afford anything bigger
	- [sd3.5_large-q3_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-GGUF-mixed-sdcpp/blob/main/sd3.5_large-q3_k_4_0.gguf)
	- [sd3.5_large-q4_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-GGUF-mixed-sdcpp/blob/main/sd3.5_large-q4_k_4_0.gguf): Exacty same size as q4_0, but with slightly less degradation. Recommended
	- [sd3.5_large_turbo-q4_k_4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_4_1.gguf): Smaller than q4_1, and with comparable degradation. Recommended
	- [sd3.5_large_turbo-q4_k_5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_5_0.gguf): Smaller than q5_0, and with comparable degradation. Very close to the original f16 already. Recommended

	### Legacy types:

	- [sd3.5_large_turbo-q4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_0.gguf): Same size as q4_k_4_0, Not recommended (use q4_k_4_0 instead)
	- [sd3.5_large_turbo-q4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_1.gguf): Not recommended (q4_k_4_1 is better and smaller)
	- [sd3.5_large_turbo-q5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_0.gguf): Barely better and bigger than q4_k_5_0
	- [sd3.5_large_turbo-q5_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_1.gguf): Better and bigger than q5_0
	- [sd3.5_large_turbo-q8_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q8_0.gguf): Basically indistinguishable from the original f16, but much smaller. Recommended for best quality

	## Outputs:

	Sorted by model size (Note that q4_0 and q4_k_4_0 are the exact same size)

	\| Quantization \| Robot girl \| Text \| Cute kitten \|
	\| ------------------ \| -------------------------------- \| ---------------------------------- \| ---------------------------------- \|
	\| q2_k_4_0 \| ![q2_k_4_0](Images/q2_k_4_0.png) \| ![q2_k_4_0](Images/1_q2_k_4_0.png) \| ![q2_k_4_0](Images/2_q2_k_4_0.png) \|
	\| q3_k_4_0 \| ![q3_k_4_0](Images/q3_k_4_0.png) \| ![q3_k_4_0](Images/1_q3_k_4_0.png) \| ![q3_k_4_0](Images/2_q3_k_4_0.png) \|
	\| q4_0 \| ![q4_0](Images/q4_0.png) \| ![q4_0](Images/1_q4_0.png) \| ![q4_0](Images/2_q4_0.png) \|
	\| q4_k_4_0 \| ![q4_k_4_0](Images/q4_k_4_0.png) \| ![q4_k_4_0](Images/1_q4_k_4_0.png) \| ![q4_k_4_0](Images/2_q4_k_4_0.png) \|
	\| q4_k_4_1 \| ![q4_k_4_1](Images/q4_k_4_1.png) \| ![q4_k_4_1](Images/1_q4_k_4_1.png) \| ![q4_k_4_1](Images/2_q4_k_4_1.png) \|
	\| q4_1 \| ![q4_1](Images/q4_1.png) \| ![q4_1](Images/1_q4_1.png) \| ![q4_1](Images/2_q4_1.png) \|
	\| q4_k_5_0 \| ![q4_k_5_0](Images/q4_k_5_0.png) \| ![q4_k_5_0](Images/1_q4_k_5_0.png) \| ![q4_k_5_0](Images/2_q4_k_5_0.png) \|
	\| q5_0 \| ![q5_0](Images/q5_0.png) \| ![q5_0](Images/1_q5_0.png) \| ![q5_0](Images/2_q5_0.png) \|

	only 28 steps, cfg scale 4.5

	Generated with a modified version of sdcpp with [this PR](https://github.com/leejet/stable-diffusion.cpp/pull/397) applied to enable clip timestep embeddings support.

	Text encoders used: q4_k quant of t5xxl, full precision clip_g, and q8 quant of [ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14) in place of clip_l.

	Full prompts and settings in png metadata.