Mistral-7B-Instruct-v0.2 loopy text generation with custom chat template

#68
by ercanucan - opened

Dear community,

We are using Mistral-7B-Instruct-v0.2 (off-the-shelf, no fine-tuning etc.) with a chat template in order to accept the System prompts (as user input) from various GUI chat clients such as ChatBox (https://github.com/Bin-Huang/chatbox) and ChatGPT-lite (https://github.com/blrchen/chatgpt-lite).

Problem:

The issue we observe with this chat template together with Mistral is the following: Often times (after a couple of chat turns), Mistral starts generating repetitive response and goes on for a very long time, as if it does not know when to stop. It tends to show this behavior especially on basic questions such as hi, who are you?, and tell me about yourself. Has anyone experienced such a behavior? Do you spot any potential issue with the template we are using? Any hints here would be highly appreciated!

The chat-template is below, which is the same as https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja only with an adaptation to remove the redundant newlines:

{% if messages[0]['role'] == 'system' -%}
    {% set loop_messages = messages[1:] -%}
    {% set system_message = messages[0]['content'].strip() + '\n\n' -%}
{% else -%}
    {% set loop_messages = messages -%}
    {% set system_message = '' -%}
{% endif -%}
{{ bos_token -}}
{% for message in loop_messages -%}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif -%}
    {% if loop.index0 == 0 -%}
        {% set content = system_message + message['content'] -%}
    {% else -%}
        {% set content = message['content'] -%}
    {% endif -%}
    {% if message['role'] == 'user' -%}
        {{ '[INST] ' + content.strip() + ' [/INST]' -}}
    {% elif message['role'] == 'assistant' -%}
        {{ ' '  + content.strip() + ' ' + eos_token -}}
    {% endif -%}
{% endfor -%}

Many thanks for your help!

Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens

UPDATE: Ran a test, Instruct v1 does not seem to have this issue.

Right I've ran into this it feels like with Ooba it got worse 2-3 months ago. But maybe that's just me? I think a certain update made this issue worse.

Sliding window attention matters

Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens

UPDATE: Ran a test, Instruct v1 does not seem to have this issue.

Sliding window attention matters

Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens

UPDATE: Ran a test, Instruct v1 does not seem to have this issue.

What do you mean? Do I want it on or off? How do I tell? How do I change it? It is baked into the Quant or is it a setting? Thanks.

Sign up or log in to comment