ikawrakow
/

mistral-instruct-7b-quantized-gguf

Inference Endpoints

Model card Files Files and versions Community

mistral-instruct-7b-quantized-gguf / README.md

ikawrakow's picture

Update README.md

b608e0f 10 months ago

|

history blame contribute delete

1.48 kB

	---
	license: apache-2.0
	---

	This repository contains alternative Mistral-instruct-7B (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) quantized models in GGUF format for use with `llama.cpp`.
	The models are fully compatible with the oficial `llama.cpp` release and can be used out-of-the-box.

	I'm carefull to say "alternative" rather than "better" or "improved" as I have not put any effort into evaluating performance
	differences in actual usage. Perplexity is lower compared to the "official" `llama.cpp` quantization, but perplexity is not
	necessarily a good measure for real world performance. Nevertheless, perplexity does measure quantization error, so below is a table
	comparing perplexities of these quantized models to the current `llama.cpp` quantization approach on Wikitext for a context length of 512 tokens.
	The "Quantization Error" columns in the table are defined as `(PPL(quantized model) - PPL(fp16))/PPL(fp16)`.

	\| Quantization \| Model file \| PPL(llama.cpp) \| Quantization Error \| PPL(new quants) \| Quantization Error \|
	\|--:\|--:\|--:\|--:\|--:\|--:\|
	\|Q3_K_S\| mistral-instruct-7b-q3k-small.gguf \| 6.9959 \| 4.27% \| 6.8920 \| 2.72% \|
	\|Q3_K_M\| mistral-instruct-7b-q3k-medium.gguf\| 6.8892 \| 2.68% \| 6.8089 \| 1.48% \|
	\|Q4_K_S\| mistral-instruct-7b-q4k-small.gguf \| 6.7649 \| 0.82% \| 6.7351 \| 0.38% \|
	\|Q5_K_S\| mistral-instruct-7b-q5k-small.gguf \| 6.7197 \| 0.15% \| 6.7186 \| 0.13% \|
	\|Q4_0 \| mistral-instruct-7b-q40.gguf \| 6.7728 \| 0.94% \| 6.7191 \| 0.14% \|