Gibberish on 'latest', with recent qwopqwop GPTQ/triton and ooba?

#2
by andysalerno - opened

Hi, I'm sure I'm making an obvious mistake, but hoping someone can point it out to me.

I'm getting gibberish output from this model on the 'latest' branch, the one with act-order.

I'm on qwopqwop GPTQ, triton branch, commit: 05781593c81 (May 8th, most recent commit as of this posting)

and ooba: ab08cf646543c (May 14th, today)

I'm on native Arch linux, not wsl.

Is there something I need to change? Do I need to be on the bleeding edge GPTQ branch called "fastest-inference-4bit" which has the most recent activity?

Thanks. And apologies for being yet another "gibberish output" post :) Really appreciate all the great work you're doing.

Same

USER: What is 4x8?
ASSISTANT: Burgлия Sud Reserve Stockrn Wall TournFD Beauobre tématuMDb husrut Star stickbourgoin respectEventListener Bour Bruno Fourierrn titles BlaConstraint Autor lo Matrixrou conspлияMatrix Fin framern Chart substitutionsko SudMDbлиялияrn BeauMDb Assume BurgлиялиялияAA

Same here... similar output to the above

Ugh, sorry about that. I went back to using the old ooba fork of GfL because if I used the latest version, people can't do CPU offloading. I didn't realise it would result in gibberish with the new fork. So if you're OK going back to https://github.com/oobabooga/GPTQ-for-LLaMa then it will work fine there.

I can also confirm it works great with AutoGPTQ, which you can use easily from Python code (you need to pass strict=False to .from_quantized() when loading the model)

And early/preliminary support for AutoGPTQ was just added to text-generation-webui, so you could experiment with it there.

I'd like to say I'll add a '128-latest' version like I used to do. But I'm uploading so many models now that I can't promise I'll get to it.

I can also confirm it works great with AutoGPTQ, which you can use easily from Python code (you need to pass strict=False to .from_quantized() when loading the model)

And early/preliminary support for AutoGPTQ was just added to text-generation-webui, so you could experiment with it there.

Anyone got AutoGPTQ worked?

I add a quantize_config.json file

{
  "bits": 4,
  "desc_act": true,
  "true_sequential": true,
  "group_size": 128
}

then start with

python server.py --autogptq --model-dir /data --model Wizard-Vicuna-13B-Uncensored-GPTQ_last --listen-host 0.0.0.0 --chat --api --notebook --xformers

But the model is still output gibberish like 'Burgлия '

I can also confirm it works great with AutoGPTQ, which you can use easily from Python code (you need to pass strict=False to .from_quantized() when loading the model)

And early/preliminary support for AutoGPTQ was just added to text-generation-webui, so you could experiment with it there.

Anyone got AutoGPTQ worked?

I add a quantize_config.json file

{
  "bits": 4,
  "desc_act": true,
  "true_sequential": true,
  "group_size": 128
}

then start with

python server.py --autogptq --model-dir /data --model Wizard-Vicuna-13B-Uncensored-GPTQ_last --listen-host 0.0.0.0 --chat --api --notebook --xformers

But the model is still output gibberish like 'Burgлия '

It should be "desc_act": false

But you don't need to specify that yourself, it's already in the repo

Just download the full contents of this repo and run latest text-gen-ui. You don't even need to specify --autogptq as that's default now

Hi, I'm just starting out on my journey here so apologies for what is probably a very dumb question. I'm getting the gibberish response and this thread explains how to fix it... but I don't understand what I need to do.

I'm not using python, but the very last line of this thread says "Just download the full contents of this repo and run latest text-gen-ui" which seems to say that downloading the repo and running the latest text-gen-ui should work. But, as I said, I'm getting the gibberish.

I'm clearly misunderstanding something but I don't know what, can anyone help?

Same as kimh49, I'm having the this issue on oobabooga's latest text-generation-webui. Here is an example of the results I'm getting:
firefox_PBaMbD0HU4.png

I'm not sure if it's related, but I also keep seeing this in the log when I try to generate anything:

D:\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py:648: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

Would greatly appreciate any help in sorting this out.

Sign up or log in to comment