metadata

license: gemma
pipeline_tag: text-classification
tags:
  - transformers
  - sentence-transformers
language:
  - multilingual

Reranker

More details please refer to our Github: FlagEmbedding.

Model List
Usage
Fine-tuning
Evaluation
Citation

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

Here, we introduce a lightweight reranker bge-reranker-v2.5-gemma2-lightweight, which is a multilingual model trained based on gemma2-9b. By integrating token compression capabilities and layerwise reduction, the model can maintain outstanding performance while saving significant resources.

Our model primarily demonstrates the following capabilities:

Lightweight: The model can be made lightweight through token compression, layerwise reduction, or a combination of both.
Outstanding performance: The model has achieved new state-of-the-art (SOTA) performance on both BEIR and MIRACL.

We will release a technical report about lightweight reranker soon with more details.

You can use bge-reranker-v2.5-gemma2-lightweight with the following different prompts:

Predict whether passage B contains an answer to query A.
Predict whether passages A and B have the same meaning.
Predict whether queries A and B are asking the same thing.
Predict whether argument A and counterargument B express contradictory opinions.

Model List

Model	Base model	Language	layerwise	compress ratio	compress layers	feature
BAAI/bge-reranker-base	xlm-roberta-base	Chinese and English	-	-	-	Lightweight reranker model, easy to deploy, with fast inference.
BAAI/bge-reranker-large	xlm-roberta-large	Chinese and English	-	-	-	Lightweight reranker model, easy to deploy, with fast inference.
BAAI/bge-reranker-v2-m3	bge-m3	Multilingual	-	-	-	Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference.
BAAI/bge-reranker-v2-gemma	gemma-2b	Multilingual	-	-	-	Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities.
BAAI/bge-reranker-v2-minicpm-layerwise	MiniCPM-2B-dpo-bf16	Multilingual	8-40	-	-	Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference.
BAAI/bge-reranker-v2.5-gemma2-lightweight	google/gemma-2-9b	Multilingual	8-42	1, 2, 4, 8	[8, 16, 24, 32, 40]	Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers, compress ratio and compress layers for output, facilitating accelerated inference.

You can select the model according your senario and resource.

For multilingual, utilize BAAI/bge-reranker-v2-m3, BAAI/bge-reranker-v2-gemma and BAAI/bge-reranker-v2.5-gemma2-lightweight
For Chinese or English, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-minicpm-layerwise.
For efficiency, utilize BAAI/bge-reranker-v2-m3 and the low layer of BAAI/bge-reranker-v2-minicpm-layerwise.
For better performance, recommand BAAI/bge-reranker-v2-minicpm-layerwise and BAAI/bge-reranker-v2-gemma

Usage

Using FlagEmbedding

git clone https://github.com/FlagOpen/FlagEmbedding.git
cd FlagEmbedding
pip install -e .

For LLM-based lightweight reranker

from FlagEmbedding import LightWeightFlagLLMReranker
reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layer=[24, 40]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
print(score)

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28], compress_ratio=2, compress_layer=[24, 40])
print(scores)

Using Huggingface transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def last_logit_pool(logits: torch.Tensor,
                    attention_mask: torch.Tensor) -> torch.Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return logits[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = logits.shape[0]
        return torch.stack([logits[i, sequence_lengths[i]] for i in range(batch_size)], dim=0)

def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
    if prompt is None:
        prompt = "Predict whether passage B contains an answer to query A."
    sep = "\n"
    prompt_inputs = tokenizer(prompt,
                              return_tensors=None,
                              add_special_tokens=False)['input_ids']
    sep_inputs = tokenizer(sep,
                           return_tensors=None,
                           add_special_tokens=False)['input_ids']
    inputs = []
    query_lengths = []
    prompt_lengths = []
    for query, passage in pairs:
        query_inputs = tokenizer(f'A: {query}',
                                 return_tensors=None,
                                 add_special_tokens=False,
                                 max_length=max_length * 3 // 4,
                                 truncation=True)
        passage_inputs = tokenizer(f'B: {passage}',
                                   return_tensors=None,
                                   add_special_tokens=False,
                                   max_length=max_length,
                                   truncation=True)
        item = tokenizer.prepare_for_model(
            [tokenizer.bos_token_id] + query_inputs['input_ids'],
            sep_inputs + passage_inputs['input_ids'],
            truncation='only_second',
            max_length=max_length,
            padding=False,
            return_attention_mask=False,
            return_token_type_ids=False,
            add_special_tokens=False
        )
        item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
        item['attention_mask'] = [1] * len(item['input_ids'])
        inputs.append(item)
        query_lengths.append(len([tokenizer.bos_token_id] + query_inputs['input_ids'] + sep_inputs))
        prompt_lengths.append(len(sep_inputs + prompt_inputs))
        
    return tokenizer.pad(
            inputs,
            padding=True,
            max_length=max_length + len(sep_inputs) + len(prompt_inputs),
            pad_to_multiple_of=8,
            return_tensors='pt',
    ), query_lengths, prompt_lengths

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
tokenizer.padding_side = 'right'
model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
model = model.to('cuda')
model.eval()

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
    inputs, query_lengths, prompt_lengths = get_inputs(pairs, tokenizer)
    inputs = inputs.to(model.device)
    outputs = model(**inputs,
                    return_dict=True,
                    cutoff_layers=[28],
                    compress_ratio=2,
                    compress_layer=[24, 40],
                    query_lengths=query_lengths,
                    prompt_lengths=prompt_lengths)
    scores = []
    for i in range(len(outputs.logits)):
        logits = last_logit_pool(outputs.logits[i], outputs.attention_masks[i])
        scores.append(logits.cpu().float().tolist())
    print(scores)

Load model in local

make sure gemma_config.py and gemma_model.py from BAAI/bge-reranker-v2.5-gemma2-lightweight in your local path.
modify the following part of config.json:

"auto_map": {
    "AutoConfig": "gemma_config.CostWiseGemmaConfig",
    "AutoModel": "gemma_model.CostWiseGemmaModel",
    "AutoModelForCausalLM": "gemma_model.CostWiseGemmaForCausalLM"
  },

Evaluation

The configuration of saving 60% Flops is: compress_ratios=2, compress_layer=[8], cutoff_layers=[25].

BEIR:

BEIR	bge-large-en-v1.5	Bge-rearanker v2 m3	jina-reranker-v2-base-multilingual	bge-reranker-v2-gemma	bge-reranker-v2.5-gemma2-lightweight	bge-reranker-v2.5-gemma2-lightweight
Save Flops	-	-	-	-	60%	0
ArguAna	63.54	37.7	52.23	78.68	86.04	86.16
ClimateFEVER	36.49	37.99	34.65	39.07	48.41	48.48
CQA	42.23	38.24	40.21	45.85	49.18	48.9
DBPedia	44.16	48.15	49.31	49.92	51.98	52.11
FEVER	87.17	90.15	92.44	90.15	94.71	94.69
FiQA2018	44.97	49.32	45.88	49.32	60.48	60.95
HotpotQA	74.11	84.51	81.81	86.15	87.84	87.89
MSMARCO	42.48	47.79	47.83	48.07	47.23	47.26
NFCorpus	38.12	34.85	37.73	39.73	41.4	41.64
NQ	55.04	69.37	67.35	72.6	75.37	75.58
QuoraRetrieval	89.06	89.13	87.81	90.37	91.25	91.18
SCIDOCS	22.62	18.25	20.21	21.65	23.71	23.87
SciFact	74.64	73.08	76.93	77.22	80.5	80.38
Touche2020	25.08	35.68	32.45	35.68	30.64	31.09
TRECCOVID	74.89	83.39	80.89	85.51	84.26	84.85
Mean	54.31	55.36	56.52	60.71	63.1	63.67

BEIR	e5-mistral-7b-instruct	bge-reranker-v2-gemma	bge-reranker-v2.5-gemma-lightweight	bge-reranker-v2.5-gemma-lightweight
Save Flops	-	-	60%	0
ArguAna	61.8	79.05	86.02	86.58
ClimateFEVER	38.37	37.66	47.27	47.13
CQA	42.97	46.16	49.06	49.53
DBPedia	48.84	50.77	52.45	52.87
FEVER	87.82	91.36	94.85	95.19
FiQA2018	56.58	50.96	58.81	61.19
HotpotQA	75.72	86.99	88.49	88.82
MSMARCO	43.06	48.35	47.65	47.4
NFCorpus	38.58	39.25	42.28	42.17
NQ	63.56	73.44	75	76.28
QuoraRetrieval	89.59	90.44	91.09	91.18
SCIDOCS	16.3	20.77	22.2	22.69
SciFact	76.26	77.78	79.94	80.98
Touche2020	26.24	35.79	28.69	31.17
TRECCOVID	87.07	88.13	86.61	87.36
Mean	56.85	61.13	63.36	64.04

MIRACL:

MIRACL (dev, nDCG@10)	Average (18)	save flops	ar	bn	en	es	fa	fi	fr	hi	id	ja	ko	ru	sw	te	th	zh	de	yo
bge-m3 (Dense)	69.2	-	78.4	80.0	56.9	56.1	60.9	78.6	58.3	59.5	56.1	72.8	69.9	70.1	78.7	86.2	82.6	62.7	56.7	81.8
jina-reranker-v2-base-multilingual	69.6	-	73.4	81.9	58.9	58.6	60.5	77.2	56.1	62.7	59.6	72.7	74.0	67.1	78.1	85.8	81.2	63.0	58.2	84.2
bge-reranker-v2-m3	74.4	-	81.7	84.6	63.5	64.4	65.7	82.4	63.7	68.5	62.7	80.0	73.8	76.9	82.3	89.4	85.3	65.2	62.7	87.4
bge-reranker-v2-gemma	75.0	-	82.3	85.0	66.6	65.3	65.5	82.6	65.4	69.4	61.2	79.7	75.1	78.3	81.8	89.6	86.1	66.8	64.0	85.9
bge-reranker-v2.5-gemma2-lightweight	77.1	60%	82.5	87.8	68.6	67.6	67.5	82.8	68.5	71.4	63.8	82.8	75.9	79.8	84.8	90.8	88.1	69.9	65.8	89.6
bge-reranker-v2.5-gemma-lightweight	77.3	0	82.8	87.6	69.3	67.8	67.4	83.3	68.5	71.3	63.8	83.6	75.7	80.1	85.1	90.8	88.7	69.9	65.6	89.8

Citation

If you find this repository useful, please consider giving a star and citation

@misc{li2023making,
      title={Making Large Language Models A Better Foundation For Dense Retrieval}, 
      author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao},
      year={2023},
      eprint={2312.15503},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{chen2024bge,
      title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, 
      author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
      year={2024},
      eprint={2402.03216},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}