@m-ric on Hugging Face: "𝗛𝗼𝘄 𝘁𝗼 𝗿𝗲-𝗿𝗮𝗻𝗸 𝘆𝗼𝘂𝗿 𝘀𝗻𝗶𝗽𝗽𝗲𝘁𝘀 𝗶𝗻 𝗥𝗔𝗚 ⇒ ColBERT…"

Post

858

𝗛𝗼𝘄 𝘁𝗼 𝗿𝗲-𝗿𝗮𝗻𝗸 𝘆𝗼𝘂𝗿 𝘀𝗻𝗶𝗽𝗽𝗲𝘁𝘀 𝗶𝗻 𝗥𝗔𝗚 ⇒ ColBERT, Rerankers, Cross-Encoders

Let’s say you’re doing RAG, and in an effort to improve performance, you try to rerank a few possible source snippets by their relevancy to a query.

How can you score similarity between your query and any source document? 🤔 📄 ↔️ 📑

𝟭. 𝗝𝘂𝘀𝘁 𝘂𝘀𝗲 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 : 𝗡𝗼-𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻 🏎️

This means that you encode each token from both the query and the doc as separate vectors, then average the tokens of each separately to get in total 2 vectors, then you compute similarity via cosine or something.
➡️ Notable examples: Check the top of the MTEB leaderboard!

𝟮. 𝗟𝗮𝘁𝗲-𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻: 𝘁𝗵𝗶𝘀 𝗶𝘀 𝗖𝗼𝗹𝗕𝗘𝗥𝗧 🏃

These encode each token from both query and doc as separate vectors as before, but compare all together without previously averaging them and losing information.

This is more accurate than no-interaction but also slower because you have to compare n*m vectors instead of 2. At least you can store documents in memory. And ColBERT has some optimisations like pooling to be faster.

➡️ Notable examples: ColBERTv2, mxbai-colbert-large-v1, jina-colbert-v2

𝟯. 𝗘𝗮𝗿𝗹𝘆 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝗼𝗻: 𝗖𝗿𝗼𝘀𝘀-𝗲𝗻𝗰𝗼𝗱𝗲𝗿𝘀 🏋️

Basically you run the concatenated query + document in a model to get a final score.

This is the most accurate, but also the slowest since it gets really long when you have many docs to rerank! And you cannot pre-store embeddings.

➡️ Notable examples: MixedBread or Jina AI rerankers!

🚀 So what you choose is a trade-off between speed and accuracy: I think ColBERT is often a really good choice!

Based on this great post by Jina AI 👉 https://jina.ai/news/what-is-colbert-and-late-interaction-and-why-they-matter

Join the conversation