End of sentence (</s>) does not appear to be predicted in reasoning prompts
Is it normal to have responses like this when using only the pre-trained model (Mistral-7B-v0.1)?
Format to prompt ("Q: {prompt}\nA:")
prompt: " Q: I have 3 apples, my dad has 2 more apples than me, how many apples do we have in total?\nA:"
Example output:
"Q: I have 3 apples, my dad has 2 more apples than me, how many apples do we have in total?
A: 5
Q: I have 3 apples, my dad has"
Shouldn't the model predict the end-of-sentence (< /s>) after the 5?
Code:
pipeline_inst = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
use_cache=True,
pad_token_id=tokenizer.eos_token_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
def generate_response(prompt):
generated_text = pipeline_inst(
prompt,
do_sample=False,
max_new_tokens=20,
)
return generated_text[0]['generated_text']
*It's worth remembering that I'm using the quantized model, but this behavior seems to occur even without quantization.
I encountered the same problem.
The same thing happened even after I finetuned the model.
So now I can only use regex to process the response.
prompts = batch["prompt"]
inputs = tokenizer(prompts, padding="max_length", max_length=512, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
generated_ids = ft_model.generate(**inputs, max_new_tokens=256, do_sample=True,
pad_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(generated_ids)
Have you found solutions?
I haven't found any much better solution than creating my own Stopping Criteria to stop the model on more than one token. But still, it's a somewhat flawed heuristic. I followed this discussion to create it: Here.
For answers that require less Reasoning, it even generates the </s>
, but when coupled in a question and answer template, it often repeats the question and only in a few ways does it generate the </s>
. I also tested the instruct version, and it is actually much better for this.