Lopp at the end of sentences
exllama:
Temperature:0.95
Top-K:off
Top-P:0.75
Min-P:off
Typical:0.25
User:
Hello, How are you today?
Chatbot:
I am not sure how to answer that question because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because because
The problem does not happen in AutoGPTQ,
Also I tried m-sys/FastChat with with GPTQ-for-LLaMa and this error This error appears:
File "C:\ProgramData\Miniconda\envs\cuda-env\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half
Any solution to set context in exllama or FastChat?
I understand add " --alpha 4.0 " to exllama fix the problem, but I cant find any solution for FastChat.