Could you provide the result per dataset on MTEB?

by Kaguya-19 - opened May 9

May 9

I am an undergraduate student at Tsinghua University and am currently doing research on embedding models. Your work GritLM is exciting, the publicly available Medi2 dataset is useful, and the ablation experiments in it are very helpful.

I am currently using Medi2 for embedding model training, but the performance on retrieval tasks is somewhat different. Can you provide the results of each data set on MTEB, I want to check for errors in my training. Thank you again!

Muennighoff

GritLM org May 9

Sure they should be all in here: https://huggingface.co/datasets/GritLM/results

Kaguya-19

May 10

Sorry, I'm not very clear. The results at https://huggingface.co/datasets/GritLM/results appear to be GritLM trained on the E5 dataset, can you provide the test results per dataset of the MEDI2-trained model? Thanks again!

Muennighoff

GritLM org May 10

They are all in there, e.g. these are the results from this model: https://huggingface.co/datasets/GritLM/results/tree/main/gritlm_m7_sq2048_medi2bge_bbcc

Kaguya-19

May 13

I see, they are quite meaningful! Many thanks!

Kaguya-19 changed discussion status to closed May 13

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment