Spaces:
Running
on
CPU Upgrade
Restart the space for new models
Hi,
@Muennighoff
Thanks for the great work!
I submitted two new Chinese Text Embedding models: "stella-base-zh" and "stella-large-zh" , can you help restart this space?
Thanks!
Hi, @Muennighoff , thanks for taking the time to restart the leaderboard cache. Can you help this time refreshing the leaderboard?
I have another query:
- How can I do have also a German leaderboard for MTEB like the CH and PO language that you have on the GH repo?
Dones!
We can add a German leaderboard but there's not many german datasets at this point so I would wait for more first
Okay, understood. I would try to generate a German dataset using translation.
If we consider the knowledge distillation technics for text representation of other languages, do you think it's worth it for the semantic search task? @Muennighoff
If you do human translation it's fine; A high quality machine translation may be OK, too (BEIR-PL is machine-translated).
By semantic search do you mean Retrieval?
Yes, exactly for the retrieval process.
Then to correct the rank of the result to the user query, I use a cross-encoder which works perfectly.
But sometimes, the default pre-trained model worked well with the German text, with some complicated sentences, it performed badly.
That is why I thought to use knowledge distillation.
I see. Not sure, maybe it could work