Mistral-7B-Instruct-v0.2 loopy text generation with custom chat template

#68

by ercanucan - opened Mar 18

Discussion

ercanucan

Mar 18

•

edited Mar 18

Dear community,

We are using Mistral-7B-Instruct-v0.2 (off-the-shelf, no fine-tuning etc.) with a chat template in order to accept the System prompts (as user input) from various GUI chat clients such as ChatBox (https://github.com/Bin-Huang/chatbox) and ChatGPT-lite (https://github.com/blrchen/chatgpt-lite).

Problem:

The issue we observe with this chat template together with Mistral is the following: Often times (after a couple of chat turns), Mistral starts generating repetitive response and goes on for a very long time, as if it does not know when to stop. It tends to show this behavior especially on basic questions such as hi, who are you?, and tell me about yourself. Has anyone experienced such a behavior? Do you spot any potential issue with the template we are using? Any hints here would be highly appreciated!

The chat-template is below, which is the same as https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja only with an adaptation to remove the redundant newlines:

{% if messages[0]['role'] == 'system' -%}
    {% set loop_messages = messages[1:] -%}
    {% set system_message = messages[0]['content'].strip() + '\n\n' -%}
{% else -%}
    {% set loop_messages = messages -%}
    {% set system_message = '' -%}
{% endif -%}
{{ bos_token -}}
{% for message in loop_messages -%}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif -%}
    {% if loop.index0 == 0 -%}
        {% set content = system_message + message['content'] -%}
    {% else -%}
        {% set content = message['content'] -%}
    {% endif -%}
    {% if message['role'] == 'user' -%}
        {{ '[INST] ' + content.strip() + ' [/INST]' -}}
    {% elif message['role'] == 'assistant' -%}
        {{ ' '  + content.strip() + ' ' + eos_token -}}
    {% endif -%}
{% endfor -%}

Many thanks for your help!

OPPEYRADY

Apr 3

•

edited Apr 5

Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens

UPDATE: Ran a test, Instruct v1 does not seem to have this issue.

Goldenblood56

Apr 4

Right I've ran into this it feels like with Ooba it got worse 2-3 months ago. But maybe that's just me? I think a certain update made this issue worse.

GuMMYY

Apr 12

Sliding window attention matters

Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens

UPDATE: Ran a test, Instruct v1 does not seem to have this issue.

Goldenblood56

May 1

Sliding window attention matters

Similar issues with a finetuned model as well, it will sometimes work but othertimes generate massive pitfalls of information till it hits max tokens

UPDATE: Ran a test, Instruct v1 does not seem to have this issue.

What do you mean? Do I want it on or off? How do I tell? How do I change it? It is baked into the Quant or is it a setting? Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment