Mistral-7B-Instruct-v0.2 loopy text generation with custom chat template
Dear community,
We are using Mistral-7B-Instruct-v0.2 (off-the-shelf, no fine-tuning etc.) with a chat template in order to accept the System prompts (as user input) from various GUI chat clients such as ChatBox (https://github.com/Bin-Huang/chatbox) and ChatGPT-lite (https://github.com/blrchen/chatgpt-lite).
Problem:
The issue we observe with this chat template together with Mistral is the following: Often times (after a couple of chat turns), Mistral starts generating repetitive response and goes on for a very long time, as if it does not know when to stop. It tends to show this behavior especially on basic questions such as hi
, who are you?
, and tell me about yourself
. Has anyone experienced such a behavior? Do you spot any potential issue with the template we are using? Any hints here would be highly appreciated!
The chat-template is below, which is the same as https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja only with an adaptation to remove the redundant newlines:
{% if messages[0]['role'] == 'system' -%}
{% set loop_messages = messages[1:] -%}
{% set system_message = messages[0]['content'].strip() + '\n\n' -%}
{% else -%}
{% set loop_messages = messages -%}
{% set system_message = '' -%}
{% endif -%}
{{ bos_token -}}
{% for message in loop_messages -%}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif -%}
{% if loop.index0 == 0 -%}
{% set content = system_message + message['content'] -%}
{% else -%}
{% set content = message['content'] -%}
{% endif -%}
{% if message['role'] == 'user' -%}
{{ '[INST] ' + content.strip() + ' [/INST]' -}}
{% elif message['role'] == 'assistant' -%}
{{ ' ' + content.strip() + ' ' + eos_token -}}
{% endif -%}
{% endfor -%}
Many thanks for your help!
Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens
UPDATE: Ran a test, Instruct v1 does not seem to have this issue.
Right I've ran into this it feels like with Ooba it got worse 2-3 months ago. But maybe that's just me? I think a certain update made this issue worse.
Sliding window attention matters
Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens
UPDATE: Ran a test, Instruct v1 does not seem to have this issue.
Sliding window attention matters
Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens
UPDATE: Ran a test, Instruct v1 does not seem to have this issue.
What do you mean? Do I want it on or off? How do I tell? How do I change it? It is baked into the Quant or is it a setting? Thanks.