Spaces:
Running
on
CPU Upgrade
Add German part added in MTEB
Hi all - it seems like there is lots of progress happening at the github repo and German by now is standard as well:
https://github.com/embeddings-benchmark/mteb/pull/214
Could we incude this here as well? :-)
Yeah it is on the roadmap :)
Hi Niklas! Any way I can help you with that? Currently on vacation but afterwards would be in for this. Or do you have some specific timing plans you would like to post it? :)
It'd be great to have you! It will be added as part of this project: https://github.com/embeddings-benchmark/mteb/tree/main/docs/mmteb
Help from you in adding new German datasets, reviewing PRs etc. , solving outstanding issues would be amazing! ๐
Already did with the false friends dataset, and seeing also from industry side how we can contribute ! :)
Amazing! There's many other parts where we could use your help if you want, maybe @KennethEnevoldsen can point you to something to work on ๐
@aari1995 we have been discussing updating the leaderboard to allow filters by e.g. "domain language etc. along with a few standard benchmarks as well. This reformatting step might be the best place to add the German benchmark to the GUI. Feel free to start a discussion over at MTEB on this if this is something you are interested in. Alternatively you can create a PR akin to e.g.: https://huggingface.co/spaces/mteb/leaderboard/discussions/102
Adding a tab for the German benchmark.
hi @KennethEnevoldsen so I currently have a version where I worked in most of it, however for example the FalseFriends pair classification does not seem to work (keyerror). Is it maybe because there are a) no results in the list and / or b) there are no results on HF? Or do I need to do something specific to add datasets, except for adding them to the config.yaml ?
Should I already PR the overview with the working ones or wait add the others, for example by running it on models? (PR would be: https://huggingface.co/spaces/mteb/leaderboard/discussions/113)
I believe it is probably due to the lack of results. You might need to run at least one model on it