custom system prompt gets ignored

#28

by parsapico - opened Jul 23

Jul 23

•

hi i recently downloaded this model, the tokenizer chat template seems to ignore custom system prompt and always add that pre-set system prompt.
here is my code :

inputs = tokenizer.apply_chat_template([
    {
        "role": "system",
        "content": "test system prompt"
    },
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
], return_tensors="pt",return_dict=True,add_generation_prompt=True).to("cuda")
print(tokenizer.decode(inputs["input_ids"][0]))

this is the output :

"<|begin_of_text|>System: This is a chat between a user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions based on the context. The assistant should also indicate when the answer cannot be found in the context.\n\nUser: What is the capital of France?\n\nAssistant:"

parsapico changed discussion title from system prompt doesn't work to custom system prompt doesn't work Jul 23

parsapico changed discussion title from custom system prompt doesn't work to custom system prompt gets ignored Jul 23

YoussefPearls

Jul 26

Did you find a solution?

parsapico

Jul 26

not a neat solution but i did this :

def get_formatted_input(context, messages):
    messages = copy.deepcopy(messages)
    system = "System: {system prompt}"
    instruction = "{instruction prompt}"

    for item in messages:
        print(item)
        if item['role'] == "user":
            # only apply this instruction for the first user turn
            item['content'] = instruction + " " + item['content']
            break

    conversation = '\n\n'.join(["User: " + item["content"] if item["role"] ==
                               "user" else "Assistant: " + item["content"] for item in messages]) + "\n\nAssistant:"
    formatted_input = system + "\n\n" + context + "\n\n" + conversation

    return tokenizer.bos_token + formatted_input

def apply_chat_template(context, messages, *args, **kwargs):
    formatted_input = get_formatted_input(context, messages)
    tokenized_prompt = tokenizer(formatted_input, *args, **kwargs)
    return tokenized_prompt

YoussefPearls

Jul 30

Try increasing min_new_tokens inside the model.generate

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment