Luke Merrick commited on
Commit
1511436
1 Parent(s): 35df7db

Update README for int8 quantization tips

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -7646,7 +7646,7 @@ Additionally, this model was designed to pair well with a corpus-independent sca
7646
  | v1.5 | 256 | int8 | 256 (8.3%) | 54.2 (99%) | 3.9M (12x) |
7647
  | v1.5 | 256 | int4 | 128 (4.2%) | 53.7 (98%) | 7.8M (24x) |
7648
 
7649
- NOTE: A good uniform scalar quantization range to use with this model (and which was used in the eval above), is -0.18 to 0.18. For a detailed walkthrough of int4 quantization with `snowflake-arctic-embed-m-v1.5`, check out our [example notebook on GitHub](https://github.com/Snowflake-Labs/arctic-embed/tree/main/compressed_embeddings_examples/score_arctic_embed_m_v1dot5_with_quantization.ipynb).
7650
 
7651
  ## Usage
7652
 
@@ -7840,6 +7840,7 @@ console.log(similarities); // [0.15664823859882132, 0.24481869975470627]
7840
  This model is designed to generate embeddings which compress well down to 128 bytes via a two-part compression scheme:
7841
  1. Truncation and renormalization to 256 dimensions (a la Matryoskha Representation Learning, see [the original paper for reference](https://arxiv.org/abs/2205.13147)).
7842
  2. 4-bit uniform scalar quantization of all 256 values to the same range (-0.18 to +0.18).
 
7843
 
7844
  For an in-depth examples, check out our [arctic-embed GitHub repositiory](https://github.com/Snowflake-Labs/arctic-embed).
7845
 
 
7646
  | v1.5 | 256 | int8 | 256 (8.3%) | 54.2 (99%) | 3.9M (12x) |
7647
  | v1.5 | 256 | int4 | 128 (4.2%) | 53.7 (98%) | 7.8M (24x) |
7648
 
7649
+ NOTE: Good uniform scalar quantization ranges to use with this model (and which were used in the eval above), are -0.18 to +0.18 for 4bit and -0.3 to +0.3 for 8bit. For a detailed walkthrough of using integer quantization with `snowflake-arctic-embed-m-v1.5`, check out our [example notebook on GitHub](https://github.com/Snowflake-Labs/arctic-embed/tree/main/compressed_embeddings_examples/score_arctic_embed_m_v1dot5_with_quantization.ipynb).
7650
 
7651
  ## Usage
7652
 
 
7840
  This model is designed to generate embeddings which compress well down to 128 bytes via a two-part compression scheme:
7841
  1. Truncation and renormalization to 256 dimensions (a la Matryoskha Representation Learning, see [the original paper for reference](https://arxiv.org/abs/2205.13147)).
7842
  2. 4-bit uniform scalar quantization of all 256 values to the same range (-0.18 to +0.18).
7843
+ - For 8-bit uniform scalar quantization, the slightly wider range -0.3 to +0.3 tends to work slightly better given how much more granular 8-bit quantization is.
7844
 
7845
  For an in-depth examples, check out our [arctic-embed GitHub repositiory](https://github.com/Snowflake-Labs/arctic-embed).
7846