Drowning in Documents: Consequences of Scaling Reranker Inference
Abstract
Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be more effective. We challenge this assumption by measuring reranker performance for full retrieval, not just re-scoring first-stage retrieval. Our experiments reveal a surprising trend: the best existing rerankers provide diminishing returns when scoring progressively more documents and actually degrade quality beyond a certain limit. In fact, in this setting, rerankers can frequently assign high scores to documents with no lexical or semantic overlap with the query. We hope that our findings will spur future research to improve reranking.
Community
Rerankers (cross-encoders) and retrievers (embeddings) are often derived from the same architecture, yet rerankers are assumed to be more accurate given that they jointly encode the query and document rather than process them independently. In this work, we find two surprising results with respect to this intuition: 1. reranking helps at first, but eventually reranking too many documents leads to a decrease in quality, and 2. in a fair match up between rerankers and retrievers where we rerank the full dataset, rerankers are less accurate than retrievers. In our paper we detail extensive experiments across both academic and enterprise datasets, and include results that suggest listwise reranking with LLMs are more robust than cross-encoders when scaling inference via reranking.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback (2024)
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking (2024)
- An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking (2024)
- RARe: Retrieval Augmented Retrieval with In-Context Examples (2024)
- Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Extremely interesting, nice work ๐
We maintain some domain-specific hybrid search systems and this paper has shown us we need to look at optimizing the top-k in our cross-encoder phase. Interesting work - I'm a little disappointed more CE models weren't used (mixed-bread).
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper