Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
Abstract
Building effective dense retrieval systems remains difficult when relevance supervision is not available. Recent work has looked to overcome this challenge by using a Large Language Model (LLM) to generate hypothetical documents that can be used to find the closest real document. However, this approach relies solely on the LLM to have domain-specific knowledge relevant to the query, which may not be practical. Furthermore, generating hypothetical documents can be inefficient as it requires the LLM to generate a large number of tokens for each query. To address these challenges, we introduce Real Document Embeddings from Relevance Feedback (ReDE-RF). Inspired by relevance feedback, ReDE-RF proposes to re-frame hypothetical document generation as a relevance estimation task, using an LLM to select which documents should be used for nearest neighbor search. Through this re-framing, the LLM no longer needs domain-specific knowledge but only needs to judge what is relevant. Additionally, relevance estimation only requires the LLM to output a single token, thereby improving search latency. Our experiments show that ReDE-RF consistently surpasses state-of-the-art zero-shot dense retrieval methods across a wide range of low-resource retrieval datasets while also making significant improvements in latency per-query.
Community
We introduce ReDE-RF, a zero-shot approach for building dense retrievers in domains where generating hypothetical documents is challenging.
- We show that re-framing hypothetical document generation as relevance estimation can improve retrieval accuracy and search latency compared to previous SOTA approaches that leverage LLMs at inference time.
- Code to be released soon!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RARe: Retrieval Augmented Retrieval with In-Context Examples (2024)
- AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels (2024)
- PairDistill: Pairwise Relevance Distillation for Dense Retrieval (2024)
- Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model (2024)
- HyQE: Ranking Contexts with Hypothetical Query Embeddings (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper