BSC-LT/salamandra7b_rag_prompt_ca-en-es

How to use

This instructed model uses a chat template that must be adhered to the input for conversational use. The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "BSC-LT/salamandra7b_rag_prompt_ca-en-es"

prompt = "Here is a question that you should answer based on the given context. Write a response that answers the question using only information provided in the context. Provide the answer in Spanish."

context = """Water boils at 100°C (212°F) at standard atmospheric pressure, which is at sea level.
However, this boiling point can vary depending on altitude and atmospheric pressure.
At higher altitudes, where atmospheric pressure is lower, water boils at a lower temperature.
For example, at 2,000 meters (about 6,600 feet) above sea level, water boils at around 93°C (199°F).
"""
instruction = "At what temperature does water boil?"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=torch.bfloat16
  )

content = f"{prompt}\n\nContext:\n{context}\n\nQuestion:\n{instruction}"
chat = [ { "role": "user", "content": content } ]

prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

eos_tokens = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|im_end|>"),
  ]

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), eos_token_id=eos_tokens, max_new_tokens=200)