baidu-ultr_uva-bert_twotower / README.md

Update README.md

59c6e78 verified 6 months ago

5.13 kB

	---
	license: mit
	datasets:
	- philipphager/baidu-ultr-pretrain
	- philipphager/baidu-ultr_uva-mlm-ctr
	metrics:
	- log-likelihood
	- dcg@1
	- dcg@3
	- dcg@5
	- dcg@10
	- ndcg@10
	- mrr@10
	co2_eq_emissions:
	emissions: 2090
	source: "Calculated using the [ML CO2 impact calculator](https://mlco2.github.io/impact/#compute), training for 4 x 45 hours with a carbon efficiency of 0.029 kg/kWh. You can inspect the carbon efficiency of the French national grid provider here: https://www.rte-france.com/eco2mix/les-emissions-de-co2-par-kwh-produit-en-france"
	training_type: "Pre-training"
	geographical_location: "Grenoble, France"
	hardware_used: "4 NVIDIA H100-80GB GPUs"
	---

	# Two Tower MonoBERT trained on Baidu-ULTR
	A flax-based MonoBERT cross encoder trained on the [Baidu-ULTR](https://arxiv.org/abs/2207.03051) dataset with an additivie two tower architecture as suggested by [Yan et al](https://research.google/pubs/revisiting-two-tower-models-for-unbiased-learning-to-rank/). Similar to a position-based click model (PBM), a two tower model jointly learns item relevance (with a BERT model) and position bias (in our case using a single embedding per rank). For more info, [read our paper](https://arxiv.org/abs/2404.02543) and [find the code for this model here](https://github.com/philipphager/baidu-bert-model).

	## Test Results on Baidu-ULTR
	Ranking performance is measured in DCG, nDCG, and MRR on expert annotations (6,985 queries). Click prediction performance is measured in log-likelihood on one test partition of user clicks (≈297k queries).

	\| Model \| Log-likelihood \| DCG@1 \| DCG@3 \| DCG@5 \| DCG@10 \| nDCG@10 \| MRR@10 \|
	\|------------------------------------------------------------------------------------------------\|----------------\|-------\|-------\|-------\|--------\|---------\|--------\|
	\| [Pointwise Naive](https://huggingface.co/philipphager/baidu-ultr_uva-bert_naive-pointwise) \| 0.227 \| 1.641 \| 3.462 \| 4.752 \| 7.251 \| 0.357 \| 0.609 \|
	\| [Pointwise Two-Tower](https://huggingface.co/philipphager/baidu-ultr_uva-bert_twotower) \| 0.218 \| 1.629 \| 3.471 \| 4.822 \| 7.456 \| 0.367 \| 0.607 \|
	\| [Pointwise IPS](https://huggingface.co/philipphager/baidu-ultr_uva-bert_ips-pointwise) \| 0.222 \| 1.295 \| 2.811 \| 3.977 \| 6.296 \| 0.307 \| 0.534 \|
	\| [Listwise Naive](https://huggingface.co/philipphager/baidu-ultr_uva-bert_naive-listwise) \| - \| 1.947 \| 4.108 \| 5.614 \| 8.478 \| 0.405 \| 0.639 \|
	\| [Listwise IPS](https://huggingface.co/philipphager/baidu-ultr_uva-bert_ips-listwise) \| - \| 1.671 \| 3.530 \| 4.873 \| 7.450 \| 0.361 \| 0.603 \|
	\| [Listwise DLA](https://huggingface.co/philipphager/baidu-ultr_uva-bert_dla) \| - \| 1.796 \| 3.730 \| 5.125 \| 7.802 \| 0.377 \| 0.615 \|


	## Usage
	Here is an example of downloading the model and calling it for inference on a mock batch of input data. For more details on how to use the model on the Baidu-ULTR dataset, take a look at our [training](https://github.com/philipphager/baidu-bert-model/blob/main/main.py) and [evaluation scripts](https://github.com/philipphager/baidu-bert-model/blob/main/eval.py) in our code repository.

	```Python
	import jax.numpy as jnp

	from src.model import PBMCrossEncoder

	model = PBMCrossEncoder.from_pretrained(
	"philipphager/baidu-ultr_uva-bert_twotower",
	)

	# Mock batch following Baidu-ULTR with 4 documents, each with 8 tokens
	batch = {
	# Query_id for each document
	"query_id": jnp.array([1, 1, 1, 1]),
	# Document position in SERP
	"positions": jnp.array([1, 2, 3, 4]),
	# Token ids for: [CLS] Query [SEP] Document
	"tokens": jnp.array([
	[2, 21448, 21874, 21436, 1, 20206, 4012, 2860],
	[2, 21448, 21874, 21436, 1, 16794, 4522, 2082],
	[2, 21448, 21874, 21436, 1, 20206, 10082, 9773],
	[2, 21448, 21874, 21436, 1, 2618, 8520, 2860],
	]),
	# Specify if a token id belongs to the query (0) or document (1)
	"token_types": jnp.array([
	[0, 0, 0, 0, 1, 1, 1, 1],
	[0, 0, 0, 0, 1, 1, 1, 1],
	[0, 0, 0, 0, 1, 1, 1, 1],
	[0, 0, 0, 0, 1, 1, 1, 1],
	]),
	# Marks if a token should be attended to (True) or ignored, e.g., padding tokens (False):
	"attention_mask": jnp.array([
	[True, True, True, True, True, True, True, True],
	[True, True, True, True, True, True, True, True],
	[True, True, True, True, True, True, True, True],
	[True, True, True, True, True, True, True, True],
	]),
	}

	outputs = model(batch, train=False)
	print(outputs)
	```

	## Reference
	```
	@inproceedings{Hager2024BaiduULTR,
	author = {Philipp Hager and Romain Deffayet and Jean-Michel Renders and Onno Zoeter and Maarten de Rijke},
	title = {Unbiased Learning to Rank Meets Reality: Lessons from Baidu’s Large-Scale Search Dataset},
	booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR`24)},
	organization = {ACM},
	year = {2024},
	}
	```