philipphager
/

baidu-ultr_uva-bert_ips-listwise

Inference Endpoints

Model card Files Files and versions Community

philipphager commited on May 1

Commit

b40d40b

•

1 Parent(s): 40c0ec6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ co2_eq_emissions:
 A flax-based MonoBERT cross encoder trained on the [Baidu-ULTR](https://arxiv.org/abs/2207.03051) dataset with a **listwise softmax cross-entropy loss with IPS correction** adopted based on the work by [Ai et al](https://arxiv.org/abs/1804.05938). The loss uses inverse propensity scoring to mitigate position bias in click data by weighting clicks on items higher that are less likely to be observed by users. For more info, [read our paper](https://arxiv.org/abs/2404.02543) and [find the code for this model here](https://github.com/philipphager/baidu-bert-model).
 ## Test Results on Baidu-ULTR
-Ranking performance is measured in DCG, nDCG, and MRR on expert annotations (6,985 queries). Click prediction performance is measured in log-likelihood on one test partition of user clicks (49,495 queries).
 | Model                                                                                          | Log-likelihood | DCG@1 | DCG@3 | DCG@5 | DCG@10 | nDCG@10 | MRR@10 |
 |------------------------------------------------------------------------------------------------|----------------|-------|-------|-------|--------|---------|--------|

 A flax-based MonoBERT cross encoder trained on the [Baidu-ULTR](https://arxiv.org/abs/2207.03051) dataset with a **listwise softmax cross-entropy loss with IPS correction** adopted based on the work by [Ai et al](https://arxiv.org/abs/1804.05938). The loss uses inverse propensity scoring to mitigate position bias in click data by weighting clicks on items higher that are less likely to be observed by users. For more info, [read our paper](https://arxiv.org/abs/2404.02543) and [find the code for this model here](https://github.com/philipphager/baidu-bert-model).
 ## Test Results on Baidu-ULTR
+Ranking performance is measured in DCG, nDCG, and MRR on expert annotations (6,985 queries). Click prediction performance is measured in log-likelihood on one test partition of user clicks (≈297k queries).
 | Model                                                                                          | Log-likelihood | DCG@1 | DCG@3 | DCG@5 | DCG@10 | nDCG@10 | MRR@10 |
 |------------------------------------------------------------------------------------------------|----------------|-------|-------|-------|--------|---------|--------|