splade / wrapup.md
macavaney's picture
Update wrapup.md
d6938b5 verified

A newer version of the Gradio SDK is available: 5.5.0

Upgrade

Putting it all together

When you use the document encoder in an indexing pipeline, the rewritten document contents are indexed:

D
SPLADE
D
Indexer
IDX
import pyterrier as pt
import pyt_splade

dataset = pt.get_dataset('irds:msmarco-passage')
splade = pyt_splade.Splade()

indexer = pt.IterDictIndexer('./msmarco_psg', pretokenised=True)

indxer_pipe = splade.doc_encoder() >> indexer
indxer_pipe.index(dataset.get_corpus_iter())

Once you built an index, you can build a retrieval pipeline that first encodes the query, and then performs retrieval:

Q
SPLADE
Q
TF Retriever
IDX
R
splade_retr = splade.query_encoder() >> pt.terrier.Retriever('./msmarco_psg', wmodel='Tf')

References & Credits

This package uses Naver's SPLADE repository.