catyung/t5l-turbo-hotpot-0331

Model Description

Query Rewriting in Retrieval-Augmented Large Language Models

Arxiv : https://arxiv.org/abs/2305.14283

Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline, making remarkable progress in knowledge-intensive tasks. This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs from the perspective of the query rewriting. We first prompt an LLM to generate the query, then use a web search engine to retrieve contexts. Furthermore, to better align the query to the frozen modules, we propose a trainable scheme for our pipeline. A small language model is adopted as a trainable rewriter to cater to the black-box LLM reader. The rewriter is trained using the feedback of the LLM reader by reinforcement learning.

Developed by: https://github.com/xbmxb/RAG-query-rewriting
Model type: google/t5-large
Checkpoint: checkpoint_20

Inference

from transformers import T5Tokenizer,T5ForConditionalGeneration,BitsAndBytesConfig 
import torch

# 8 bit Quantization
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = T5ForConditionalGeneration.from_pretrained('catyung/t5l-turbo-hotpot-0331',
                                quantization_config=quantization_config)

tokenizer = T5Tokenizer.from_pretrained('catyung/t5l-turbo-hotpot-0331')

rewrite_prompt = f"""rewrite a better search query: {user_query}
answer:"""

# Inference
user_query = "What profession does Nicholas Ray and Elia Kazan have in common?"

input_ids = tokenizer(rewrite_prompt, return_tensors="pt").input_ids.to(device)

outputs = model.generate(input_ids,max_new_tokens=50)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(result)