cmarkea
/

CodeLlama-70b-hf-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Cyrile commited on Aug 9

Commit

3911988

•

1 Parent(s): 66f67ed

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -3,4 +3,10 @@ library_name: transformers
 license: llama2
 ---
-Converted version of [CodeLlama-70b](https://huggingface.co/meta-llama/CodeLlama-70b-hf) to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.

 license: llama2
 ---
+Converted version of [CodeLlama-70b](https://huggingface.co/meta-llama/CodeLlama-70b-hf) to 4-bit using bitsandbytes. For more information about the model, refer to the model's page.
+## Impact on performance
+In the following figure, we can see the impact on the performance of a set of models relative to the required RAM space. It is noticeable that the quantized models have equivalent performance while providing a significant gain in RAM usage.
+![constellation](https://i.postimg.cc/QdTqLr0Z/constellation.png)