Spaces:

mteb
/

leaderboard

Running on CPU Upgrade

App Files Files Community

144

New model and mteb leaderboard refresh request

#117

by nada5 - opened May 23

Discussion

nada5

May 23

Hi, Huggingface MTEB team.

We submitted a new embedding model scores on https://huggingface.co/nvidia/NV-Embed-v1. Would you help refreshing the mteb leaderboard?
Thank you!

Best,
Chankyu

tomaarsen

Massive Text Embedding Benchmark org May 23

Hello!

I've triggered a restart. Looking forward to hearing more about your model!
I'd love to be able to assist from the Hugging Face side to get the biggest reach for your model at release. Perhaps I can add you to one of the Hugging Face slack channels for easier communication on this?

It looks like you're missing some metrics for CQADupstackRetrieval (which consists of a few different datasets). Otherwise, your model is visible on the leaderboard.

Tom Aarsen

tomaarsen

Massive Text Embedding Benchmark org May 23

Also, based on your Classification scores (AmazonCounterfactualClassification, EmotionClassification), it seems plausible that the MTEB testing sets accidentally leaked in your training set. For the former, you reach 95.12% accuracy compared to 88% for the 2nd highest (which may have also been overfitted, most models reach 70-75%), and for the latter, you reach 91.7% accuracy while the 2nd highest accuracy is 59.81% (primarily due to how the Emotion dataset is not very high quality).

Tom Aarsen

nada5

May 23

Hi, Tom

Thanks for the quick response!

Thanks for pointing CQADupstackRetrieval. We merged the CQADupstack*Retrieval results into one and modified the readme now. Can you please refresh the leaderboard again when you are available?
Yes, please add me to Huggingface Slack Channel. My email is "[email protected]".
Since the training splits of EmotionClassification and AmazonCounterfactualClassification have some similar content as the evaluation splits, we use BM25 similarity thresholds to remove similar content from the training splits and also remove exact matches.

Best,
Chankyul

tomaarsen

Massive Text Embedding Benchmark org May 24

I've invited you to a Slack channel for this.
As for the CQADupstackRetrieval - there were still some issues, your current README states an NDCG@10 of 5050.54, for example. My recommendation is to run merge_cqadupstack.py which should take care of the merging for you. Once you've corrected it, then I can restart the leaderboard and you'll show up again, for now I've removed the model as it was scoring an average of ~158 out of 100 across all tasks 😄

Tom Aarsen

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment