Add `eos_token` to the tokenizer config.
Browse filesIf we merge the ChatWidget right now, it would not work since the widget would not be able to format the chat_template correctly from the information available via API (https://huggingface.co/api/models/microsoft/DialoGPT-large?config=True). This is because to get `eos_token` one need to get `eos_token_id` from [config.json](https://huggingface.co/microsoft/DialoGPT-large/blob/main/config.json) and then reading [`vocab.json`](https://huggingface.co/microsoft/DialoGPT-large/blob/main/vocab.json) to check which token is associated with this id.
This PR fixes this by adding `eos_token` directly to `tokenizer_config.json` cc
@julien-c
@osanseviero
@sbrandeis
- tokenizer_config.json +2 -1
tokenizer_config.json
CHANGED
@@ -1,4 +1,5 @@
|
|
1 |
{
|
2 |
"model_max_length": 1024,
|
3 |
-
"chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}"
|
|
|
4 |
}
|
|
|
1 |
{
|
2 |
"model_max_length": 1024,
|
3 |
+
"chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}",
|
4 |
+
"eos_token": "<|endoftext|>"
|
5 |
}
|