lavawolfiee
/

Mixtral-8x7B-Instruct-v0.1-offloading-hqq-4bit-3bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lavawolfiee commited on Dec 26, 2023

Commit

380e4d8

•

1 Parent(s): 35d82eb

Create README.md

Files changed (1) hide show

README.md +2 -0

README.md ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Attention quantization: HQQ 4-bit, groupsize 64, compress zero, compress scale with groupsize 256 \
2	+ Experts quantization: HQQ 3-bit, groupsize 64, compress zero, compress scale with groupsize 128