TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ · RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336

May 9, 2023

Hi there,

I wouldn't be surprised if this is just me having missed some setting I was supposed to put in, but I was wondering if somebody could help me. I can load the OpenAssistant 30B 1028g version and it responds to a short prompts, but longer prompts give the error below. Eventually, after a few short prompts, all requests result in this error, so it appears to be 'running out of room' in some way.

Traceback (most recent call last):
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\callbacks.py", line 73, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\text_generation.py", line 251, in generate_with_callback
shared.model.generate(**kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
return self.sample(
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
outputs = self(
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 687, in forward
outputs = self.model(
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 577, in forward
layer_outputs = decoder_layer(
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:*******\Oobabooga\oobabooga_windows\oobabooga_windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 362, in forward
weight = weight.reshape(-1, self.groupsize, weight.shape[2])
RuntimeError: shape '[-1, 1024, 6656]' is invalid for input of size 44302336
Output generated in 0.25 seconds (0.00 tokens/s, 0 tokens, context 157, seed 162660829)

ulymp

May 10, 2023

Same error here with the exact same message (on Linux, that is). Happens after a few short prompts. Also have the 1024 version from the main branch.

TheBloke

Owner May 12, 2023

Could you both check you're running the latest version of text-generation-webui, and then re-save your GPTQ parameters for the model. There was a bug with saving GPTQ parameters in textgen, which was recently fixed.

Are either of you using pre_layer/CPU offload, or multiple GPUs, or any special config like that?

If you're still getting the problem after updating, please let me know and I'll test it myself.

phooney

May 13, 2023

Hi there,

I updated and got this error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.3 which is incompatible.

Not sure how important that is, as the updater kept on going and doing its thing after that, but thought I'd put it in in case it was relevant.

Anyway, after that I am still getting the same error/behaviour as before. It does appear to have some trouble saving the parameters. I can save them (4 wbits, 1024 groupsize, llama) and reload the model no problem. The settings appear to persist when I load a different model and then load OpenAssistant again.

I'm using 1x RTX4090 and I have pre_layer at 0. Nothing ticked out of auto-devices, disk, cpu, bf16 or load-in-8bit. gpu-memory in MiB for device: 0. cpu-memory in MiB 0.

TheBloke

Owner May 13, 2023

OK, I will test this again shortly and let you know

What OS are you running?

phooney

May 14, 2023

Win 11 Pro for me. Thank you for looking into it, much appreciated.

Yhyu13

May 14, 2023

Same error here with the exact same message (on Linux, that is). Happens after a few short prompts. Also have the 1024 version from the main branch.

Same here, the 1024 seems not supported by ooba? even with -groupsize 1024 as input arg

TheBloke

Owner May 14, 2023

OK I'm going to do a no groupsize model instead. That's what I'm doing for new 30B models anyway, as otherwise it will OOM on long replies on 24GB cards

poisenbery

May 14, 2023

•

edited May 14, 2023

I found out that this has something to do with the --chat argument with characters or something.
I installed linux and got everything working.

As soon as I tried to run with --chat and the example character, I started getting this error again.

Does this mean that we CANNOT use any type of character with this model?

TheBloke

Owner May 14, 2023

Sounds like the --chat issue might be a text-gen-ui bug

But I'm doing another model now, made with no groupsize. That will be better anyway, as it needs less VRAM. It'll be uploaded in 2-3 hours.

poisenbery

May 14, 2023

Probably. The model works fine, but as soon as context is entered for a character, it gives the error.
I'll submit a bug report on oobabooga since I finally know what is causing the issue now.

autobots

May 14, 2023

I just tested 1024g with my own 30b model. I converted it with AutoGPTQ. It also has this error since it loaded with old cuda. The only GPTQ solution that was able to run it was the 4bit lora autograd one. I had no problem there with the crash on FWD.. I ran out of memory instead. I thought this was supposed to help by using less vram.. instead it uses more.

I think 1024g group size can be laid to rest.

TheBloke

Owner May 14, 2023

It uses less VRAM than 128g, but more than no groupsize at all

The no groupsize model is taking absolutely forever to pack, but it's nearly done. Be uploaded fairly soon.

autobots

May 14, 2023

So no groups is still 100% best memory usage? Good to know. Need a shootout with 128g and act order by itself and see who wins perplexity and by how much.

TheBloke

Owner May 14, 2023

•

edited May 15, 2023

Model with group_size = None and --act-order is up in main. Those of you having trouble with the 1024g file, please test this and report back.

The previous 1024g-compat file is moved to its own new branch.

phooney

May 15, 2023

New version works for me, thank you!