Endless Spaces
Tried the gguf version with ollama, used the right template, it does generate endless empty spaces and lines after an answer.
Same here, SillyTavern shows many ' \n' at the end of the generation up to the full context size but filtering it out before it shows it in the WebUI. But this slows down the response time.
Update: it didn't happen with the original files here, but on the gguf, so the issue is there not here.
Same. Converted it to gguf myself but same problem.
this should be fixed now with this change, sorry for the oversight: https://huggingface.co/DiscoResearch/DiscoLM_German_7b_v1/commit/560f972f9f735fc9289584b3aa8d75d0e539c44e
Will ping TheBloke to reup quants as soon as we´ve confirmed everything is working now. Thanks everybody for reporting this issue!
I have exactly this problem, if I use this model with LangChain and do not set explicitly the eos_token_id
:
# Loading Mistral 7b model
llm = HuggingFacePipeline.from_model_id(
model_id='DiscoResearch/DiscoLM_German_7b_v1',
task='text-generation',
model_kwargs={
'temperature': .3,
'max_length': 1024,
'quantization_config': quantization_config,
'low_cpu_mem_usage': True,
},
pipeline_kwargs={
"max_new_tokens": 2000,
"eos_token_id": 32000 # needed to avoid "endless spaces"!
},
device_map="auto",
device=None,
)