Fix add_generation_prompt in tokenizer_config.json
#24
by
irenedea
- opened
We should only add generation prompt after the last message.
Code
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b-chat', revision='refs/pr/23')
chat = [{
'content':
'Please summarize the goals in this text:\n\nGoing outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.',
'role':
'user'
}, {
'content': 'You should go outside and touch grass.',
'role': 'assistant'
}, {
'content': 'What else can I do?',
'role': 'user'
}
]
print('\nBEFORE!')
print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))
tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b-chat', revision='refs/pr/24')
print('\nAFTER!')
print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))
Output
BEFORE!
<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.
<|im_start|>user
Please summarize the goals in this text:
Going outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.<|im_end|>
<|im_start|>assistant
<|im_start|>assistant
You should go outside and touch grass.<|im_end|>
<|im_start|>assistant
<|im_start|>user
What else can I do?<|im_end|>
<|im_start|>assistant
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
AFTER!
<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.
<|im_start|>user
Please summarize the goals in this text:
Going outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.<|im_end|>
<|im_start|>assistant
You should go outside and touch grass.<|im_end|>
<|im_start|>user
What else can I do?<|im_end|>
<|im_start|>assistant
ty!
sam-mosaic
changed pull request status to
merged
irenedea
changed pull request title from
Update tokenizer_config.json
to Fix add_generation_prompt in tokenizer_config.json