Spaces:

mteb
/

leaderboard

Running on CPU Upgrade

App Files Files Community

144

Add German part added in MTEB

#87

by aari1995 - opened Mar 27

Discussion

aari1995

Massive Text Embedding Benchmark org Mar 27

Hi all - it seems like there is lots of progress happening at the github repo and German by now is standard as well:

https://github.com/embeddings-benchmark/mteb/pull/214

Could we incude this here as well? :-)

Muennighoff

Massive Text Embedding Benchmark org Mar 28

Yeah it is on the roadmap :)

aari1995

Massive Text Embedding Benchmark org May 3

Hi Niklas! Any way I can help you with that? Currently on vacation but afterwards would be in for this. Or do you have some specific timing plans you would like to post it? :)

Muennighoff

Massive Text Embedding Benchmark org May 3

It'd be great to have you! It will be added as part of this project: https://github.com/embeddings-benchmark/mteb/tree/main/docs/mmteb
Help from you in adding new German datasets, reviewing PRs etc. , solving outstanding issues would be amazing! 🙌

aari1995

Massive Text Embedding Benchmark org May 3

Already did with the false friends dataset, and seeing also from industry side how we can contribute ! :)

Muennighoff

Massive Text Embedding Benchmark org May 6

Amazing! There's many other parts where we could use your help if you want, maybe @KennethEnevoldsen can point you to something to work on 🙌

KennethEnevoldsen

Massive Text Embedding Benchmark org May 6

@aari1995 we have been discussing updating the leaderboard to allow filters by e.g. "domain language etc. along with a few standard benchmarks as well. This reformatting step might be the best place to add the German benchmark to the GUI. Feel free to start a discussion over at MTEB on this if this is something you are interested in. Alternatively you can create a PR akin to e.g.: https://huggingface.co/spaces/mteb/leaderboard/discussions/102

Adding a tab for the German benchmark.

aari1995

Massive Text Embedding Benchmark org May 14

•

edited May 14

hi @KennethEnevoldsen so I currently have a version where I worked in most of it, however for example the FalseFriends pair classification does not seem to work (keyerror). Is it maybe because there are a) no results in the list and / or b) there are no results on HF? Or do I need to do something specific to add datasets, except for adding them to the config.yaml ?

Should I already PR the overview with the working ones or wait add the others, for example by running it on models? (PR would be: https://huggingface.co/spaces/mteb/leaderboard/discussions/113)

KennethEnevoldsen

Massive Text Embedding Benchmark org May 14

I believe it is probably due to the lack of results. You might need to run at least one model on it

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment