Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -245,6 +245,20 @@ with torch.no_grad():
|
|
245 |
|
246 |
## Fine-tune
|
247 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
248 |
You can fine-tune the reranker with the following code:
|
249 |
|
250 |
**For llm-based reranker**
|
|
|
245 |
|
246 |
## Fine-tune
|
247 |
|
248 |
+
### Data Format
|
249 |
+
|
250 |
+
Train data should be a json file, where each line is a dict like this:
|
251 |
+
|
252 |
+
```
|
253 |
+
{"query": str, "pos": List[str], "neg":List[str], "prompt": str}
|
254 |
+
```
|
255 |
+
|
256 |
+
`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives.
|
257 |
+
|
258 |
+
See [toy_finetune_data.jsonl](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker/toy_finetune_data.jsonl) for a toy data file.
|
259 |
+
|
260 |
+
### Train
|
261 |
+
|
262 |
You can fine-tune the reranker with the following code:
|
263 |
|
264 |
**For llm-based reranker**
|