OpenVINO
/

mixtral-8x7b-instruct-v0.1-int4-ov

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

katuni4ka commited on Apr 2

Commit

40760e5

•

1 Parent(s): e313eec

Update README.md

add note about weights compression parameters

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -10,6 +10,16 @@ language:
 ## Description
 This is [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model converted to [OpenVINO](https://docs.openvino.ai/2024/home.html) Intermediate Representation (IR) format with INT4 compressed weights using [NNCF](https://github.com/openvinotoolkit/nncf).
 ## Compatibility
 This provided IR is compatible with openvino starting with 2024.0.0 version and optimum-intel 1.16.0

 ## Description
 This is [Mixtral-8x7b-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model converted to [OpenVINO](https://docs.openvino.ai/2024/home.html) Intermediate Representation (IR) format with INT4 compressed weights using [NNCF](https://github.com/openvinotoolkit/nncf).
+## Quantization Configuration
+Model weights was compressed to INT4 precision using `nncf.compress_weights` with the following parameters:
+* mode: **INT4_SYM**
+* group_size: **128**
+* ratio: **0.8**
+More details about optimization parameters can be found in [OpenVINO model optimization guide](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html)
 ## Compatibility
 This provided IR is compatible with openvino starting with 2024.0.0 version and optimum-intel 1.16.0