LLama-Factory inference issue

#2
by ianss - opened

Hello and thanks for sharing this model!

Is there any known issue with LLama-Factory? https://github.com/hiyouga/LLaMA-Factory

I have some hard time make the inference work with zephyr prompt template.
It produces garbage or no ouput even with the lowest temperature.

I have successfully used and fine-tuned other mistral-type models with LLama-Factory

Thanks!

Institute for Language and Speech Processing org
edited Mar 28

Hello, thank you for trying out our model!

I have managed to reproduce your issue, which indeed exhibits significantly different behavior when prompts are processed through LLaMa-Factory compared to any other local implementation or framework we use.

Upon initial testing, it appears that the templates used by LLaMa-Factory impose defaults and make changes to the given prompt. These alterations interfere with the generation capabilities of the current version of the model. Unfortunately, it seems that a quick fix is not feasible at this time as it would probably require a language-specific fix for Greek.

Therefore, I would suggest exploring an alternative LoRA/inference framework for the time being.
https://github.com/mistralai-sf24/hackathon/tree/d9f22bd6c1d0a99df4077c9ac616ae3d8bb90b6d
https://github.com/huggingface/alignment-handbook/tree/main/recipes/zephyr-7b-beta

That being said, we will continue looking into it and provide an update when it's fixed.

Thank you for your prompt reply. Did you look at it in detail enough to understand exactly how they alter the input?

Before posting here, I tried a quick and dirty fix by adding a custom template for meltemi inLLaMA-Factory/src/llmtuner/data/template.py:

_register_template(
name="meltemi",
format_user=StringFormatter(slots=["<|user|>\n{{content}}", {"eos_token"}, "<|assistant|>"]),
format_assistant=StringFormatter(slots=["\n{{content}}", {"eos_token"}]),
format_system=StringFormatter(slots=["<|system|>\n{{content}}", {"eos_token"}]),
default_system="Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.",
stop_words=["</s>"],
replace_eos=True,
force_system=True,
)

but didn't seem to do the trick

Since I got used to LLama-Factory, I may give a try to fix it if I understand how the input should be fed to your model

Thanks!

Institute for Language and Speech Processing org
edited Mar 28

I'd also take a look at the StringFormatter class in src/llmtuner/data/formatter.py

Thank you for trying Meltemi out and please let us know if you have any success.

Closing the discussion for now.

droussis changed discussion status to closed
Institute for Language and Speech Processing org
edited Mar 28

Just to add, in Ollama the following template works, so I think the same should be followed in LLM-factory. The one you follow looks close, but I'm not very familiar with their templating scheme:

{{- if .System }}
<|system|>
{{ .System }}
</s>
{{- end }}
<|user|>
{{ .Prompt }}
</s>
<|assistant|>

For anyone else find this useful, this seems to expect windows new lines instead of POSIX: (i.e. use \r\n instead of \n)

_register_template(
name="meltemi",
format_user=StringFormatter(slots=["<|user|>\r\n{{content}}", {"eos_token"}, "<|assistant|>"]),
format_assistant=StringFormatter(slots=["\r\n{{content}}", {"eos_token"}]),
format_system=StringFormatter(slots=["<|system|>\r\n{{content}}", {"eos_token"}]),
default_system="Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.",
force_system=True,
)

This resolves the issue.

By the way, is this a meltemi issue?

LVouk changed discussion status to open
Institute for Language and Speech Processing org

For anyone else find this useful, this seems to expect windows new lines instead of POSIX: (i.e. use \r\n instead of \n)

_register_template(
name="meltemi",
format_user=StringFormatter(slots=["<|user|>\r\n{{content}}", {"eos_token"}, "<|assistant|>"]),
format_assistant=StringFormatter(slots=["\r\n{{content}}", {"eos_token"}]),
format_system=StringFormatter(slots=["<|system|>\r\n{{content}}", {"eos_token"}]),
default_system="Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.",
force_system=True,
)

This resolves the issue.

By the way, is this a meltemi issue?

Wow, thank you very much.

I'm curious, do you use Windows?

No, I am on debian linux. On the same machine meltemi runs fine on vllm egnine.

I am very curious what's going, because I used various models on various engines and I never had to bother with new lines.

I can only guess that there is an issue between various engines on how they treat new lines, but most models are trained to treat both \n and \r\n sequences as the same

Institute for Language and Speech Processing org
edited Mar 28

Hello again ,
I'm happy to return and see you found a solution! Good catch!
That is very weird indeed, we will investigate on our end. Through which means are you calling LLaMa-Factory ? (e.g api, cli or a script/notebook)

I faced this issue using either web interface, i.e:

python3 src/web_demo.py --model_name_or_path 'ilsp/Meltemi-7B-Instruct-v1' --template zephyr
or
python3 src/train_web.py
and choosing custom model, ilsp/Meltemi-7B-Instruct-v1 and selecting zephyr template. Switching to "default" or other templates does not help.

LLama-Factory is installed in a python venv using only pip install -r requirements.txt

I also tried calling the inference engine through the api

python3 src/api_demo.py --model_name_or_path 'ilsp/Meltemi-7B-Instruct-v1' --template zephyr

curl -X 'POST'
'http://localhost:8000/v1/chat/completions'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"model": "string",
"messages": [
{
"role": "user",
"content": "Γεια σου"
}
],
"tools": [],
"do_sample": true,
"temperature": 0,
"top_p": 0,
"n": 1,
"max_tokens": 100,
"stream": false
}'

which still produces garbage:

{"id":"chatcmpl-default","object":"chat.completion","created":1711654338,"model":"string","choices":[{"index":0,"message":{"role":"assistant","content":"4000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000","tool_calls":null},"finish_reason":"length"}],"usage":{"prompt_tokens":40,"completion_tokens":100,"total_tokens":140}}

Switching to the custom "meltemi" template which I used \r\n, produces:
python3 src/api_demo.py --model_name_or_path 'ilsp/Meltemi-7B-Instruct-v1' --template meltemi
....
{"id":"chatcmpl-default","object":"chat.completion","created":1711654422,"model":"string","choices":[{"index":0,"message":{"role":"assistant","content":"\nΓεια σου! Πώς μπορώ να σε βοηθήσω σήμερα; Εάν έχετε οποιεσδήποτε ερωτήσεις ή χρειάζεστε βοήθεια, μη διστάσετε να ρωτήσετε.","tool_calls":null},"finish_reason":"stop"}],"usage":{"prompt_tokens":89,"completion_tokens":31,"total_tokens":120}}

Institute for Language and Speech Processing org
edited Mar 28

Thanks for the info! And for finding the LLaMa-Factory side workaround!
We'll investigate on our end to find where the issue lies and produce a more definitive fix.

Institute for Language and Speech Processing org
edited Mar 30

So after looking into it a bit more it looks like LLaMa-Factory's tokenization skips adding a bos_token and instead relies on the templates themselves having a bos_token.

meltemi is sensitive to the existence of a bos_token. So a template as follows

_register_template(
    name="meltemi_with_bos",
    format_user=StringFormatter(slots=["<|user|>\n{{content}}", {"eos_token"}, "<|assistant|>"]),
    format_assistant=StringFormatter(slots=["\n{{content}}", {"eos_token"}]),
    format_system=StringFormatter(slots=[{"bos_token"}, "<|system|>\n{{content}}", {"eos_token"}]),
    default_system="Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.",
    force_system=True,
)

emulates the success of the template you shared.

The answer on, why changing the newline to windows newline in the template kickstarts it, seems to lie with the mistral model pretraining.
I'll continue looking into it but I'll close the issue for now.

LVouk changed discussion status to closed

T̶h̶a̶n̶k̶s̶ ̶f̶o̶r̶ ̶l̶o̶o̶k̶i̶n̶g̶ ̶f̶u̶r̶t̶h̶e̶r̶ ̶i̶n̶t̶o̶ ̶t̶h̶i̶s̶,̶ ̶h̶o̶w̶e̶v̶e̶r̶ ̶u̶s̶i̶n̶g̶ ̶t̶h̶e̶ ̶a̶b̶o̶v̶e̶ ̶t̶e̶m̶p̶l̶a̶t̶e̶,̶ ̶s̶t̶i̶l̶l̶ ̶p̶r̶o̶d̶u̶c̶e̶s̶ ̶g̶a̶r̶b̶a̶g̶e̶ ̶f̶o̶r̶ ̶m̶e̶.̶ Yes it's working fine. As a matter of fact it seems to work well either by using bos_token or the \r\n above.

However it seems to make grammar or morphological mistakes without using the \r\n

Just for the record, after a few prompts, it seems like I get the best replies combining both the above templates, especially for longer prompts:

_register_template(
    name="meltemi",
    format_user=StringFormatter(slots=["<|user|>\r\n{{content}}", {"eos_token"}, "<|assistant|>"]),
    format_assistant=StringFormatter(slots=["\r\n{{content}}", {"eos_token"}]),
    format_system=StringFormatter(slots=[{"bos_token"}, "<|system|>\r\n{{content}}", {"eos_token"}]),
    default_system="Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.",
    force_system=True,
)

Just managed to run this using VLLM thanks to this thread. The \r\n trick in the template is a lifesaver. For reference this is the chat template, a slight modification of the default one:

{% for message in messages %}
{% if message['role'] == 'user' %}
{{ '<|user|>\r\n' + message['content'] + eos_token }}
{% elif message['role'] == 'system' %}
{{ '<|system|>\r\n' + message['content'] + eos_token }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>\r\n' + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %}

Sign up or log in to comment