Is <BOS_TOKEN> really needed?
Do various frontends insert automatically or is adding it manually really needed?
It feels like when I add it manually in koboldcpp, I get slightly worse results, though it may just be confirmation bias.
The tokenizer is defined with adding the BOS alone. Depending on the tools and how they used the tokenizer it might be added for you unfortunately.
Using transformers:
inputs = tokenizer("This is a test") # Adds BOS automatically
inputs = tokenizer("This is a test", add_special_tokens=False) # Does not.
It would also be nice if cohere actually added the chat template which should be used with this model...
Do various frontends insert automatically or is adding it manually really needed?
llama.cpp
and Ollama
should add it as it's defined in the GGUF file:
https://github.com/ollama/ollama/issues/3650
Sadly the official Ollama
template is messed up and has an extra <|END_OF_TURN_TOKEN|>
added on the end.
It should be:
TEMPLATE """{{if .System}}<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{.System}}<|END_OF_TURN_TOKEN|>{{end}}<|START_OF_TURN_TOKEN|><|USER_TOKEN|>{{.Prompt}}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{{.Response}}"""
I nearly deleted this model because it was so bad at coding, and its 1-shot stories kinda sucked. But oh boy is it good if you guide it!!! :O
Goliath-120B
was definitely better at 1-shot Grimdark, but always got too distracted and wanted to carry on writing and not refine things, but this model seems absolutely amazing at refining its initial attempt.
It's the first model where I've actually been able to say "let's try to merge the best bits, XYZ, of your last draft(s) with this latest draft" and it actually listened...