kaitchup
/

Meta-Llama-3.1-8B-AutoRound-GPTQ-asym-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

bnjmnmarie commited on 24 days ago

Commit

38765f2

•

1 Parent(s): a7432d5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ license: apache-2.0
 This is [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) quantized with AutoRound (asymmetric quantization) to 4-bit. The model has been created, tested, and evaluated by The Kaitchup. It is compatible with the main inference frameworks, e.g., TGI and vLLM.
-Details on quantization process and evaluation:
 [Mistral-NeMo: 4.1x Smaller with Quantized Minitron](https://kaitchup.substack.com/p/mistral-nemo-41x-smaller-with-quantized)

 This is [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) quantized with AutoRound (asymmetric quantization) to 4-bit. The model has been created, tested, and evaluated by The Kaitchup. It is compatible with the main inference frameworks, e.g., TGI and vLLM.
+Details on the quantization process and evaluation:
 [Mistral-NeMo: 4.1x Smaller with Quantized Minitron](https://kaitchup.substack.com/p/mistral-nemo-41x-smaller-with-quantized)