File size: 5,496 Bytes
6fe5258
 
 
 
 
 
 
 
 
 
 
 
 
 
4ab0d85
 
 
 
6fe5258
 
 
 
 
a7f5769
 
cfed108
a7f5769
53d0efc
 
6fe5258
 
 
53d0efc
 
 
 
 
6fe5258
 
 
 
 
cfed108
6fe5258
 
a7f5769
 
60260b3
a7f5769
53d0efc
 
3b574d2
53d0efc
cfed108
a3d475c
9ba2630
f1fd453
dd9ecc8
6fe5258
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
license: other
license_name: sacla
license_link: >-
  https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md
base_model:
- stabilityai/stable-diffusion-3.5-large
base_model_relation: quantized
---
## Overview
These models are made to work with [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) release [master-ac54e00](https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-ac54e00) onwards. Support for other inference backends is not guarenteed.

Quantized using this PR https://github.com/leejet/stable-diffusion.cpp/pull/447

Normal K-quants are not working properly with SD3.5-Large models because around 90% of the weights are in tensors whose shape doesn't match the 256 superblock size of K-quants and therefore can't be quantized this way. 
Mixing quantization types allows us to take adventage of the better fidelity of k-quants to some extent while keeping the model file size relatively small.

Only the second layers of both MLPs in each MMDiT block of SD3.5 Large models have the correct shape to be compatible with k-quants. That still makes up for about 10% of all the parameters.

## Files:

### Mixed Types:


- [sd3.5_large-q2_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-GGUF-mixed-sdcpp/blob/main/sd3.5_large-q2_k_4_0.gguf): Smallest quantization yet. Use this if you can't afford anything bigger
- [sd3.5_large-q3_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-GGUF-mixed-sdcpp/blob/main/sd3.5_large-q3_k_4_0.gguf): Degraded, but usable at high step count.
- [sd3.5_large-q4_k_4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-GGUF-mixed-sdcpp/blob/main/sd3.5_large-q4_k_4_0.gguf): Exacty same size as q4_0, but with slightly less degradation. Recommended
- [sd3.5_large_turbo-q4_k_4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_4_1.gguf): Smaller than q4_1, and with comparable degradation. Recommended
- [sd3.5_large_turbo-q4_k_5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/sd3.5_large_turbo-q4_k_5_0.gguf): Smaller than q5_0, and with comparable degradation. Very close to the original f16 already. Recommended

### Legacy types:

- [sd3.5_large_turbo-q4_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_0.gguf): Same size as q4_k_4_0, Not recommended (use q4_k_4_0 instead)
- [sd3.5_large_turbo-q4_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q4_1.gguf): Not recommended (q4_k_4_1 is better and smaller)
- [sd3.5_large_turbo-q5_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_0.gguf): Barely better and bigger than q4_k_5_0
- [sd3.5_large_turbo-q5_1.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q5_1.gguf): Better and bigger than q5_0
- [sd3.5_large_turbo-q8_0.gguf](https://huggingface.co/stduhpf/SD3.5-Large-Turbo-GGUF-mixed-sdcpp/blob/main/legacy/sd3.5_large_turbo-q8_0.gguf): Basically indistinguishable from the original f16, but much smaller. Recommended for best quality

## Outputs:

Sorted by model size (Note that q4_0 and q4_k_4_0 are the exact same size)


| Quantization       | Robot girl                       | Text                               | Cute kitten                        |
| ------------------ | -------------------------------- | ---------------------------------- | ---------------------------------- |
| q2_k_4_0           | ![q2_k_4_0](Images/q2_k_4_0.png) | ![q2_k_4_0](Images/1_q2_k_4_0.png) | ![q2_k_4_0](Images/2_q2_k_4_0.png) |
| q3_k_4_0           | ![q3_k_4_0](Images/q3_k_4_0.png) | ![q3_k_4_0](Images/1_q3_k_4_0.png) | ![q3_k_4_0](Images/2_q3_k_4_0.png) |
| q4_0               | ![q4_0](Images/q4_0.png)         | ![q4_0](Images/1_q4_0.png)         | ![q4_0](Images/2_q4_0.png)         |
| q4_k_4_0           | ![q4_k_4_0](Images/q4_k_4_0.png) | ![q4_k_4_0](Images/1_q4_k_4_0.png) | ![q4_k_4_0](Images/2_q4_k_4_0.png) |
| q4_k_4_1           | ![q4_k_4_1](Images/q4_k_4_1.png) | ![q4_k_4_1](Images/1_q4_k_4_1.png) | ![q4_k_4_1](Images/2_q4_k_4_1.png) |
| q4_1               | ![q4_1](Images/q4_1.png)         | ![q4_1](Images/1_q4_1.png)         | ![q4_1](Images/2_q4_1.png)         |
| q4_k_5_0           | ![q4_k_5_0](Images/q4_k_5_0.png) | ![q4_k_5_0](Images/1_q4_k_5_0.png) | ![q4_k_5_0](Images/2_q4_k_5_0.png) |
| q5_0               | ![q5_0](Images/q5_0.png)         | ![q5_0](Images/1_q5_0.png)         | ![q5_0](Images/2_q5_0.png)         |
| q5_1               | ![q5_1](Images/q5_1.png)         | ![q5_1](Images/1_q5_1.png)         | ![q5_1](Images/2_q5_1.png)         |
| q8_0               | ![q8_0](Images/q8_0.png)         | ![q8_0](Images/1_q8_0.png)         | ![q8_0](Images/2_q8_0.png)         |
| f16(sft)           | ![f16](Images/sft.png)           | ![f16](Images/1_sft.png)           | ![f16](Images/2_sft.png)           |

only 28 steps, cfg scale 4.5

Generated with a modified version of sdcpp with [this PR](https://github.com/leejet/stable-diffusion.cpp/pull/397) applied to enable clip timestep embeddings support.

Text encoders used: q4_k quant of t5xxl, full precision clip_g, and q8 quant of [ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14) in place of clip_l.

Full prompts and settings in png metadata.