don't reproduce QuoraRetrieval NDCG@10 score.
thanks.
I want to reproduce to mteb/retrieval for QuoraRetrieval. but I get an NDCG@10 score of 80.73.
I confirm that query embedding have prompt,and doc don't have prompt。
Other dataset's NDCG@10 score can reproduce. For example SCIDOCS,ArguAna,etc.
QuoraRetrieval is a duplicate question retrieval task, i.e. matching queries to other queries instead of queries to documents. As such, we follow the common practice of using the query prefix for both queries and documents when embedding this dataset (this was not our brilliant idea by any means, it goes back to the E5 paper at least -- see their Appendix B).
I do not believe this was properly documented anywhere, though, even in our tech report. My apologies for the oversight!
You should see if this symmetrical embedding improves your organization's Stella models' scores on QuoraRetrieval, too, if you haven't yet!
(And good luck with the write-up for that one -- we're looking forward to reading when it's ready!)
Thanks ,got it.