Can't use with tgi. Getting `RuntimeError: weight transformer.h.0.self_attention.query_key_value.weight does not exist`
#12
by
mpronesti
- opened
Hi there!
I'm trying to use this model with text-generation-inference. Here's the script
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 2g -p 8080:80 -v $volume:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id TheBloke/falcon-7b-instruct-gptq \
--sharded false \
--quantize "gptq" \
--max-total-tokens 2048 \
--trust-remote-code
However, I get this error
RuntimeError: weight transformer.h.0.self_attention.query_key_value.weight does not exist
Unfortunately Text Generation Inference have included a version of GPTQ that doesn't support most of the GPTQs currently on Hugging Face.
I hope to be able to release new GPTQs in future that will be compatible, but for now you'll need to see if there's another GPTQ that works with TGI, or make your own.