Luke Merrick
commited on
Commit
•
1511436
1
Parent(s):
35df7db
Update README for int8 quantization tips
Browse files
README.md
CHANGED
@@ -7646,7 +7646,7 @@ Additionally, this model was designed to pair well with a corpus-independent sca
|
|
7646 |
| v1.5 | 256 | int8 | 256 (8.3%) | 54.2 (99%) | 3.9M (12x) |
|
7647 |
| v1.5 | 256 | int4 | 128 (4.2%) | 53.7 (98%) | 7.8M (24x) |
|
7648 |
|
7649 |
-
NOTE:
|
7650 |
|
7651 |
## Usage
|
7652 |
|
@@ -7840,6 +7840,7 @@ console.log(similarities); // [0.15664823859882132, 0.24481869975470627]
|
|
7840 |
This model is designed to generate embeddings which compress well down to 128 bytes via a two-part compression scheme:
|
7841 |
1. Truncation and renormalization to 256 dimensions (a la Matryoskha Representation Learning, see [the original paper for reference](https://arxiv.org/abs/2205.13147)).
|
7842 |
2. 4-bit uniform scalar quantization of all 256 values to the same range (-0.18 to +0.18).
|
|
|
7843 |
|
7844 |
For an in-depth examples, check out our [arctic-embed GitHub repositiory](https://github.com/Snowflake-Labs/arctic-embed).
|
7845 |
|
|
|
7646 |
| v1.5 | 256 | int8 | 256 (8.3%) | 54.2 (99%) | 3.9M (12x) |
|
7647 |
| v1.5 | 256 | int4 | 128 (4.2%) | 53.7 (98%) | 7.8M (24x) |
|
7648 |
|
7649 |
+
NOTE: Good uniform scalar quantization ranges to use with this model (and which were used in the eval above), are -0.18 to +0.18 for 4bit and -0.3 to +0.3 for 8bit. For a detailed walkthrough of using integer quantization with `snowflake-arctic-embed-m-v1.5`, check out our [example notebook on GitHub](https://github.com/Snowflake-Labs/arctic-embed/tree/main/compressed_embeddings_examples/score_arctic_embed_m_v1dot5_with_quantization.ipynb).
|
7650 |
|
7651 |
## Usage
|
7652 |
|
|
|
7840 |
This model is designed to generate embeddings which compress well down to 128 bytes via a two-part compression scheme:
|
7841 |
1. Truncation and renormalization to 256 dimensions (a la Matryoskha Representation Learning, see [the original paper for reference](https://arxiv.org/abs/2205.13147)).
|
7842 |
2. 4-bit uniform scalar quantization of all 256 values to the same range (-0.18 to +0.18).
|
7843 |
+
- For 8-bit uniform scalar quantization, the slightly wider range -0.3 to +0.3 tends to work slightly better given how much more granular 8-bit quantization is.
|
7844 |
|
7845 |
For an in-depth examples, check out our [arctic-embed GitHub repositiory](https://github.com/Snowflake-Labs/arctic-embed).
|
7846 |
|