Inference error, tensor shapes.
HI everyone, thanks @TheBloke for your great job.
Im trying to inference with TheBloke/Llama-2-70B-chat-GPTQ and I get the next error:
out = out + self.bias if self.bias is not None else out
RuntimeError: The size of tensor a (24576) must match the size of tensor b (10240) at non-singleton dimension 2
At first I thought it was an installation problema but my code works with TheBloke/Llama-2-13B-chat-GPTQ... It occurs also with FreeWilly2, maybe beacuse its based on Llama-2-70B.
Any help will be appreciated.
Can you check the sha256sum of the .safetensors file, or just try downloading the model again. The download may have terminated early, giving you an invalid file
Also please confirm you're using Transformers 4.31.0 which is required for 70B.
Thank you so much @TheBloke ! It was transformers version I thought I had the newest!
Regards!
I face the similar problem when fine-tuning with AutoGPTQ. Does you solve the problems?
@tridungduong16 make sure you're using AutoGPTQ 0.3.2 + Transformers 4.31.0
@tridungduong16 I'm confused, you said you had a problem with AutoGPTQ but your error screenshot show ExLlama, not AutoGPTQ?
If you're using ExLlama then please make sure ExLlama is updated to the latest version. This model definitely works with ExLlama, so you might have an older version that doesn't support 70B.
Sorry, I have the wrong screenshot. I use the fine-tune scripts from https://github.com/PanQiWei/AutoGPTQ/blob/main/examples/peft/peft_lora_clm_instruction_tuning.py.
It works well for 13B model such as:
- https://huggingface.co/TheBloke/Llama-2-13B-GPTQ
- https://huggingface.co/TheBloke/OpenAssistant-Llama2-13B-Orca-8K-3319-GGML
but when I fine-tune with 70B model, there is some problems.
Library version I use is:
- transformers.version '4.32.0.dev0'
- auto_gptq.version '0.3.2'