Receive empty response, regardless of which loader I choose

by anon7463435254 - opened Jul 26, 2023

Jul 26, 2023

•

edited Jul 26, 2023

There is something strange happening (same happen with the new Nous-Hermes): the model is loaded, but any message I send I receive a totally empty response (it does not even show my question). I tried this with AutoGPTQ, ExLLama and GPTQ-for-LLaMA. I'll show you an example:

After loading the model, if I try to send "Hello", this happens:

Am I missing something?

Thank you and sorry about bothering you, I hope this can help.

TheBloke

Owner Jul 26, 2023

Yeah there's something wrong with your install. Have you updated text-generation-webui to the latest version? Are you using one-click installer or manual installer? If manual install, make sure to git pull on both text-generation-webui and exllama, and to re-do pip3 install -r requirements.txt in text-generation-webui

anon7463435254

Jul 26, 2023

•

edited Jul 26, 2023

I did this inside a Colab:

!git clone https://github.com/oobabooga/text-generation-webui
%cd text-generation-webui
!pip install -r requirements.txt

and only If I want to use GPTQ-for-LLaMA I run this:

%mkdir /content/text-generation-webui/repositories/
%cd /content/text-generation-webui/repositories/
!git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
%cd GPTQ-for-LLaMa
!pip install ninja
!pip install -r requirements.txt
!python setup_cuda.py install

TheBloke

Owner Jul 26, 2023

•

edited Jul 26, 2023

And what about exllama? Did you install that? You need to install it before you use it

# in text-generation-webui directory
!mkdir repositories
!git clone https://github.com/turboderp/exllama repositories/exllama
!pip3 install -r repositories/exllama/requirements.txt

anon7463435254

Jul 26, 2023

Yes, I installed also ExLLama, but the problem I've shown previously it doesn't happen only with it anyway. It happens also with the other loaders, such as AutoGPTQ.

anon7463435254

Jul 26, 2023

•

edited Jul 26, 2023

Just to be sure, I replaced my code with the snippet you sent me, and this is the complete code of the Colab I'm running:

Same as before. The model is loaded (after doing the usual stuff on the model tab and selecting ExLLAMA), but it replies with that empty string.

Dxtrmst

Jul 26, 2023

i always encounter this issue whenever I am not selecting the correct instruction template model. Nouse Hermes is compatible with Alpaca prompting. probably same with your problem.

anon7463435254

Jul 26, 2023

This comment has been hidden

anon7463435254

Jul 26, 2023

•

edited Jul 26, 2023

i always encounter this issue whenever I am not selecting the correct instruction template model. Nouse Hermes is compatible with Alpaca prompting. probably same with your problem.

Just tried with Nous Hermes, using the colab code I put in the previous message and Alpaca instruction template and nothing changes.
I get this error:

I'm starting to think that I have something wrong with the installation of something.
Could you please provide me the code you use to do the whole process, from the installation to run the models?

Dxtrmst

Jul 26, 2023

i still use this same code in colab google

#@title 2. Install the Web UI & LLM
import os
import shutil
from IPython.display import clear_output
%cd /content/
!apt-get -y install -qq aria2

!git clone https://github.com/oobabooga/text-generation-webui
%cd /content/text-generation-webui

!pip install -r requirements.txt
!pip install -U gradio==3.28.3

!mkdir /content/text-generation-webui/repositories
%cd /content/text-generation-webui/repositories
!git clone -b cuda https://github.com/oobabooga/GPTQ-for-LLaMa.git
%cd GPTQ-for-LLaMa
!python setup_cuda.py install

%cd /content/text-generation-webui/extensions/api
!pip install -r requirements.txt

%cd /content/text-generation-webui
!python server.py --share --chat --api --public-api

Dxtrmst

Jul 26, 2023

•

edited Jul 26, 2023

just checking it now, it worked with Vicuna and alpaca (both)

then tested in prompt

TheBloke

Owner Jul 26, 2023

Oh, that old CUDA version of GPTQ-for-LLaMA is no longer supported

Please use AutoGPTQ 0.3.1 or exllama. ExLlama is much faster, and that is the recommended option in text-generation-webui.

Dxtrmst

Jul 26, 2023

yes bro. i still haven't update that part. i did not mind because when i get inside the webui, i can find exllama already.

TheBloke

Owner Jul 26, 2023

•

edited Jul 26, 2023

OK so it is working OK for you with ExLlama? That's fine then!

Dxtrmst

Jul 26, 2023

yes. working fine. thanks.

anon7463435254

Jul 26, 2023

I found the problem, guys. The problem is in the value of "max_new_tokens". If you want to try, try to set it to 4096 and then send messages. The error will appear.
I also noticed that the greater the value the more distant from the context and confused is the response. This behaviour is clearly visible with values greater enough than 2048 and with models like Nous Hermes, Wizard LM and Dolphin.
It seems that LLaMA2-chat-GPTQ can handle it (meaning that it does not show those empty messages), but if you set an high value, it will forget the context (even the previous response) and send more confused responses.
Let me know if you can try it.

TheBloke

Owner Jul 26, 2023

Ahh yeah, I've heard that - that's likely a bug in text-generation-webui or ExLlama I believe. Note this isn't directly related to the context of the model which is set on the Model screen. But yes I was told that if you let it generate more than 2K new tokens, it might not respond or might throw errors.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment