Why is EOS token added at the end of chat template?
Why is the eos_token
added at the end of the chat template when add_generation_prompt
is false
?
{{ bos_token }}
{% for message in messages %}{{'<|' + message['role'] + '|>' + '\n' + message['content'] + '<|end|>\n' }}{% endfor %}
{% if add_generation_prompt %}
{{ '<|assistant|>\n' }}
{% else %}{{ eos_token }}
{% endif %}
Since eos_token == pad_token == <|endoftext|>
it does not matter if padding side of the tokenizer is left
or right
. The generation will always fail since right padding is detected here
Or is there some other recommendation what token to use as a padding for further fine-tuning?
Thanks!
We updated the template and removed the eos_token. It was intended to be used during fine-tuning.
For fine-tuning, you can use <|endoftext|> as either eos_token or pad_token.
We updated the template and removed the eos_token. It was intended to be used during fine-tuning.
For fine-tuning, you can use <|endoftext|> as either eos_token or pad_token.
But during inference, it's <|end|> as shown on the model card? which one to add during fine-tuning?