Receive empty response, regardless of which loader I choose
There is something strange happening (same happen with the new Nous-Hermes): the model is loaded, but any message I send I receive a totally empty response (it does not even show my question). I tried this with AutoGPTQ, ExLLama and GPTQ-for-LLaMA. I'll show you an example:
After loading the model, if I try to send "Hello", this happens:
Am I missing something?
Thank you and sorry about bothering you, I hope this can help.
Yeah there's something wrong with your install. Have you updated text-generation-webui to the latest version? Are you using one-click installer or manual installer? If manual install, make sure to git pull on both text-generation-webui and exllama, and to re-do pip3 install -r requirements.txt
in text-generation-webui
I did this inside a Colab:
!git clone https://github.com/oobabooga/text-generation-webui
%cd text-generation-webui
!pip install -r requirements.txt
and only If I want to use GPTQ-for-LLaMA I run this:
%mkdir /content/text-generation-webui/repositories/
%cd /content/text-generation-webui/repositories/
!git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
%cd GPTQ-for-LLaMa
!pip install ninja
!pip install -r requirements.txt
!python setup_cuda.py install
And what about exllama? Did you install that? You need to install it before you use it
# in text-generation-webui directory
!mkdir repositories
!git clone https://github.com/turboderp/exllama repositories/exllama
!pip3 install -r repositories/exllama/requirements.txt
Yes, I installed also ExLLama, but the problem I've shown previously it doesn't happen only with it anyway. It happens also with the other loaders, such as AutoGPTQ.
i always encounter this issue whenever I am not selecting the correct instruction template model. Nouse Hermes is compatible with Alpaca prompting. probably same with your problem.
i always encounter this issue whenever I am not selecting the correct instruction template model. Nouse Hermes is compatible with Alpaca prompting. probably same with your problem.
Just tried with Nous Hermes, using the colab code I put in the previous message and Alpaca instruction template and nothing changes.
I get this error:
I'm starting to think that I have something wrong with the installation of something.
Could you please provide me the code you use to do the whole process, from the installation to run the models?
i still use this same code in colab google
#@title 2. Install the Web UI & LLM
import os
import shutil
from IPython.display import clear_output
%cd /content/
!apt-get -y install -qq aria2
!git clone https://github.com/oobabooga/text-generation-webui
%cd /content/text-generation-webui
!pip install -r requirements.txt
!pip install -U gradio==3.28.3
!mkdir /content/text-generation-webui/repositories
%cd /content/text-generation-webui/repositories
!git clone -b cuda https://github.com/oobabooga/GPTQ-for-LLaMa.git
%cd GPTQ-for-LLaMa
!python setup_cuda.py install
%cd /content/text-generation-webui/extensions/api
!pip install -r requirements.txt
%cd /content/text-generation-webui
!python server.py --share --chat --api --public-api
Oh, that old CUDA version of GPTQ-for-LLaMA is no longer supported
Please use AutoGPTQ 0.3.1 or exllama. ExLlama is much faster, and that is the recommended option in text-generation-webui.
OK so it is working OK for you with ExLlama? That's fine then!
yes. working fine. thanks.
I found the problem, guys. The problem is in the value of "max_new_tokens". If you want to try, try to set it to 4096 and then send messages. The error will appear.
I also noticed that the greater the value the more distant from the context and confused is the response. This behaviour is clearly visible with values greater enough than 2048 and with models like Nous Hermes, Wizard LM and Dolphin.
It seems that LLaMA2-chat-GPTQ can handle it (meaning that it does not show those empty messages), but if you set an high value, it will forget the context (even the previous response) and send more confused responses.
Let me know if you can try it.
Ahh yeah, I've heard that - that's likely a bug in text-generation-webui or ExLlama I believe. Note this isn't directly related to the context of the model which is set on the Model screen. But yes I was told that if you let it generate more than 2K new tokens, it might not respond or might throw errors.