mteb/leaderboard · e5-R-mistral-7b for retrieval, apply for refreshing the results

BeastyZ

Jul 9

Hi @tomaarsen @Muennighoff ,

We submitted a new model, BeastyZ/e5-R-mistral-7b. Could you please refresh the space?

Thanks!
Beasty

BeastyZ changed discussion status to closed Jul 9

Muennighoff

Massive Text Embedding Benchmark org Jul 9

Seems like it already shows up likely via the automatic refresh. Curious why the result parsing did not work though - was it because of the (default)? Did that get added by the script in the MTEB repo?

BeastyZ

Jul 10

Yes, (default) was added by the script in the MTEB repo. I have now manually deleted (default) and am waiting for the next automatic refresh.

Muennighoff

Massive Text Embedding Benchmark org Jul 10

Oh, that seems like a bug as it should work out of the box - ill need to double check - cc @KennethEnevoldsen in case you know; seems related to changes in the meta script

KennethEnevoldsen

Massive Text Embedding Benchmark org Jul 10

@BeastyZ did you create the meta data using the CLI mteb create_meta ...? If so it should work (otherwise we have a bug to fix)

BeastyZ

Jul 10

@BeastyZ did you create the meta data using the CLI mteb create_meta ...? If so it should work (otherwise we have a bug to fix)

Yes, I did. I followed the guideline here to submit my results.

KennethEnevoldsen

Massive Text Embedding Benchmark org Jul 10

Hmm right, looking at the code it also seems like it is an error on our end. @Muennighoff we should probably allow for "(default)" for consistency with the other subsets. WDYT?

Muennighoff

Massive Text Embedding Benchmark org Jul 10

we should probably allow for "(default)" for consistency with the other subsets.

It seems like we can either

(a) Change the leaderboard code to allow default. The problem here is that we do not want (default) to appear in the leaderboard table I think as it is not very useful, but we want languages to appear. So we would have to manually replace it somewhere in the code. Probably adds a line or two to the LB code here: https://github.com/embeddings-benchmark/leaderboard/blob/bef8d2ff6b420db179018d2a2689207aad180449/refresh.py#L325. The question is do we want it to appear in the Evaluation results sidebar of models? It also seems not super useful there so maybe no need but then this solution would not be desirable.
(b) Change the mteb code not to add default to the name like here. Adds one line here: https://github.com/embeddings-benchmark/mteb/blob/778d7a3bf85b2023cc8ba9b2c35a810dcfa5e924/mteb/cli.py#L298. This is how it has worked thus far.

I don't have a strong preference but given that default is not very useful info in the sidebar/metadata (note that it is still recorded in the config field just not shown in the name) & its how it has worked thus far, I'd go with (b). But happy to be disagreed with! :)

BeastyZ

Jul 11

I manually deleted the default 24 hours ago, but my model, e5-R-mistral-7b, still hasn't appeared on the retrieval leaderboard. Why is that?

tomaarsen

Massive Text Embedding Benchmark org Jul 11

The latest refresh failed: https://github.com/embeddings-benchmark/leaderboard/actions/runs/9884681390
Apologies while we work out the kinks of the new automatic refresh.

cc @KennethEnevoldsen @orionweller this is regarding the PawsXPairClassification (fr) key not being found.

Tom Aarsen

orionweller

Massive Text Embedding Benchmark org Jul 11

Yes, sorry about this @BeastyZ ! Pushing a fix now

orionweller

Massive Text Embedding Benchmark org Jul 11

EDIT: auto-refresh is working again now and I added a status check before other PRs. The results are still empty though - I assume this is an issue with the default conversation above.

orionweller

Massive Text Embedding Benchmark org Jul 11

Making an issue on the leaderboard Github to consolidate this issue: https://github.com/embeddings-benchmark/leaderboard/issues/8

BeastyZ

Jul 11

@KennethEnevoldsen @Muennighoff @orionweller @tomaarsen
Thank you for your timely and kind help! Things are moving in a positive direction. I only want to add my model into the retrieval leaderboard. Many scores appear, but Average and CQADupstackRetrieval are not among them.

BeastyZ changed discussion status to open Jul 11

orionweller

Massive Text Embedding Benchmark org Jul 11

Hey @BeastyZ ! In that Github issue I referenced earlier I pointed this out and tagged you there (I thought, perhaps I got the wrong Github handle). I agree it's an issue!

Is it okay if we move the discussion there? We're trying to move away from using the Spaces for PRs/discussion.

orionweller

Massive Text Embedding Benchmark org Jul 11

FWIW @BeastyZ the issue is you don't seem to have main_score for MTEB CQADupstackRetrieval. I think you need to aggregate them.

KennethEnevoldsen

Massive Text Embedding Benchmark org Jul 11

Will close this issue again and refer to to https://github.com/embeddings-benchmark/leaderboard/issues/8

KennethEnevoldsen changed discussion status to closed Jul 11