request for fixing inference script to get controlled the text output generation
thank you for the model that can be loaded in 8bit.
when I try to run inference, it keeps on generating texts.
Whereas , I just want an answer to my question.
Could you edit the inference script such that we only get precise answers and not uncontrolled generations.
I've attached a screenshot below of what output i am getting, hope you can suggest a fix for this.
There is no easy fix for this, a typical problem in any library.
As in https://github.com/michaelfeil/hf-hub-ctranslate2/blob/e236f006593fb00633f9874fe414c87bd9735813/hf_hub_ctranslate2/translate.py#LL309C1-L334C70
You might want to set end_token =["User“, „user“]
or so.
thanks for the reply
@michaelfeil
.
i noticed that you can get perfect answers in the mpt-7b-chat space : https://huggingface.co/spaces/mosaicml/mpt-7b-chat
could you have a look at the app.py file.
maybe it can suggest a fix to the issue.
how would you re-write the inference example you mentioned in the readme to sort of resolve the issue even a bit.could you paste the updated inference script in your next reply?
say for eg , in the below script
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
from transformers import AutoTokenizer
model_name = "michaelfeil/ct2fast-mpt-7b-chat"
model = GeneratorCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16",
)
outputs = model.generate(
text=["User: what is the largest mountain on mars? Bot:"],
max_length=256
)
print(outputs)
it would be great if the output was :
Bot : The largest mountain on mars in Olympus Mons.
It actually gives the right answer , but it then continues with some other text. I want it to stop at Olympus Mons. @michaelfeil
also you can check this notebook for eg
@michaelfeil
https://colab.research.google.com/drive/1s2beg55SEV8ICHufhwcflYtUNskTX8kO
it ends the sentence perfectly!
You might set end_token=[„User“], as keyword to model.generate
@michaelfeil , i updated the code as follows
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
from transformers import AutoTokenizer
model_name = "michaelfeil/ct2fast-mpt-7b-chat"
model = GeneratorCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16",
)
outputs = model.generate(
text=["User: what is the largest mountain on mars? Bot:"],
end_token=["User"],
max_length=256
)
print(outputs)
but i am still getting same continues output :
User and Bot are also no special tokens, they are just dummies I used for a bunch of other models.
Not sure why the stop tokens do not work in this case.
Can’t support, if you found a good solution, feel free to share it here.
Okay, stop tokens are ["<|im_end|>", "<|endoftext|>"])
And messages should be formatted in the style
f"<|im_start|>user\n{item[0]}<|im_end|>",
f"<|im_start|>assistant\n{item[1]}<|im_end|>",
]
See the app.py in mpt chat space.
@michaelfeil could you paste the whole script here if possible?
Sorry, can’t provide further help.