Junseong commited on
Commit
6626e43
1 Parent(s): 0757ec1

MOD: update README.md and Report for our update date

Browse files
LinqAIResearch2024_Linq-Embed-Mistral.pdf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:163ef254faff9c9211973337e5ea02b758e20bca05b81382c430509ab62e3bd8
3
- size 7999613
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56051a885496c4505443ffe4fff8db3bebded0793ba7d5a88bdc150694c2779f
3
+ size 7999944
README.md CHANGED
@@ -1779,7 +1779,7 @@ license: cc-by-nc-4.0
1779
 
1780
  Linq-Embed-Mistral has been developed by building upon the foundations of the [E5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) models. We focus on improving text retrieval using advanced data refinement methods, including sophisticated data crafting, data filtering, and negative mining guided by teacher models, which are highly tailored to each task, to improve the quality of the synthetic data generated by LLM. These methods are applied to both existing benchmark dataset and highly tailored synthetic dataset generated via LLMs. Our efforts primarily aim to create high-quality triplet datasets (query, positive example, negative example), significantly improving text retrieval performance.
1781
 
1782
- Linq-Embed-Mistral performs well in the MTEB benchmarks. The model excels in retrieval tasks, ranking <ins>**`1st`**</ins> among all models listed on the MTEB leaderboard with a performance score of <ins>**`60.2`**</ins>. This outstanding performance underscores its superior capability in enhancing search precision and reliability. The model achieves an average score of <ins>**`68.2`**</ins> across 56 datasets in the MTEB benchmarks, making it the highest-ranking publicly accessible model and third overall.
1783
 
1784
 
1785
  This project is for research purposes only. Third-party datasets may be subject to additional terms and conditions under their associated licenses. Please refer to specific papers for more details:
@@ -1851,7 +1851,7 @@ Check out [unilm/e5](https://github.com/microsoft/unilm/tree/master/e5) to repro
1851
 
1852
  ## Evaluation Result
1853
 
1854
- ### MTEB
1855
 
1856
  | Model Name | Retrieval (15) | Average (56) |
1857
  | :------------------------------------------------------------------------------: | :------------: | :----------: |
 
1779
 
1780
  Linq-Embed-Mistral has been developed by building upon the foundations of the [E5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) and [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) models. We focus on improving text retrieval using advanced data refinement methods, including sophisticated data crafting, data filtering, and negative mining guided by teacher models, which are highly tailored to each task, to improve the quality of the synthetic data generated by LLM. These methods are applied to both existing benchmark dataset and highly tailored synthetic dataset generated via LLMs. Our efforts primarily aim to create high-quality triplet datasets (query, positive example, negative example), significantly improving text retrieval performance.
1781
 
1782
+ Linq-Embed-Mistral performs well in the MTEB benchmarks (as of May 29, 2024). The model excels in retrieval tasks, ranking <ins>**`1st`**</ins> among all models listed on the MTEB leaderboard with a performance score of <ins>**`60.2`**</ins>. This outstanding performance underscores its superior capability in enhancing search precision and reliability. The model achieves an average score of <ins>**`68.2`**</ins> across 56 datasets in the MTEB benchmarks, making it the highest-ranking publicly accessible model and third overall.
1783
 
1784
 
1785
  This project is for research purposes only. Third-party datasets may be subject to additional terms and conditions under their associated licenses. Please refer to specific papers for more details:
 
1851
 
1852
  ## Evaluation Result
1853
 
1854
+ ### MTEB (as of May 29, 2024)
1855
 
1856
  | Model Name | Retrieval (15) | Average (56) |
1857
  | :------------------------------------------------------------------------------: | :------------: | :----------: |