Update README.md
Browse files
README.md
CHANGED
@@ -2627,7 +2627,7 @@ We also present the [`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-N
|
|
2627 |
| Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
|
2628 |
|:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
|
2629 |
|[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
|
2630 |
-
|[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English |
|
2631 |
|[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
|
2632 |
|
2633 |
|
@@ -2673,7 +2673,7 @@ from sentence_transformers.util import cos_sim
|
|
2673 |
|
2674 |
sentences = ['That is a happy person', 'That is a very happy person']
|
2675 |
|
2676 |
-
model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5')
|
2677 |
embeddings = model.encode(sentences)
|
2678 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2679 |
```
|
@@ -2688,6 +2688,11 @@ print(cos_sim(embeddings[0], embeddings[1]))
|
|
2688 |
|
2689 |
### Training Procedure
|
2690 |
|
|
|
|
|
|
|
|
|
|
|
2691 |
- MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000
|
2692 |
- MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000
|
2693 |
- MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000
|
@@ -2700,7 +2705,9 @@ print(cos_sim(embeddings[0], embeddings[1]))
|
|
2700 |
|
2701 |
### MTEB
|
2702 |
|
2703 |
-
The
|
|
|
|
|
2704 |
|
2705 |
| Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
|
2706 |
|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
|
2627 |
| Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
|
2628 |
|:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
|
2629 |
|[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
|
2630 |
+
|[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English | 434 | 8192 | 1024 | 65.39 | 86.71 |
|
2631 |
|[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
|
2632 |
|
2633 |
|
|
|
2673 |
|
2674 |
sentences = ['That is a happy person', 'That is a very happy person']
|
2675 |
|
2676 |
+
model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5', trust_remote_code=True)
|
2677 |
embeddings = model.encode(sentences)
|
2678 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2679 |
```
|
|
|
2688 |
|
2689 |
### Training Procedure
|
2690 |
|
2691 |
+
To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy.
|
2692 |
+
The model first undergoes preliminary MLM pre-training on shorter lengths.
|
2693 |
+
And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training.
|
2694 |
+
|
2695 |
+
The entire training process is as follows:
|
2696 |
- MLM-512: lr 2e-4, mlm_probability 0.3, batch_size 4096, num_steps 300000, rope_base 10000
|
2697 |
- MLM-2048: lr 5e-5, mlm_probability 0.3, batch_size 4096, num_steps 30000, rope_base 10000
|
2698 |
- MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 30000, rope_base 160000
|
|
|
2705 |
|
2706 |
### MTEB
|
2707 |
|
2708 |
+
The results of other models are retrieved from [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
|
2709 |
+
|
2710 |
+
The gte evaluation setting: `mteb==1.2.0, fp16 auto mix precision, max_length=8192`, and set ntk scaling factor to 2 (equivalent to rope_base * 2).
|
2711 |
|
2712 |
| Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
|
2713 |
|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|