What does the tokenization for fill-in-the-middle requests look like?
#5
by
XeIaso
- opened
I'm looking at messing around with the fill-in-the-middle support for codestral but I can't figure out what I'd do to use it. I see that there's a FIMRequest class, but I want to know what tokens I should use for it with llama.cpp.
Thanks for making these models! They're a lot of fun to use personally and professionally.
based on mistralai/mistral-common/tokens/tokenizers/sentencepiece.py#L335 and mistralai/mistral-common/tokens/tokenizers/base.py#L10 the prompt should be like
<s>[SUFFIX]suffix_code[PREFIX]prefix_code
with the bos token </s>
as stopping condition. However, I also see a [MIDDLE]
token which isn't used, maybe I'm forgetting something?
My expectation is that middle is used as the last token before the generated response.