TheBloke/Llama-2-7B-Chat-GPTQ

I encountered the same error as in https://github.com/huggingface/text-generation-inference/issues/601#issuecomment-1652866165.
TGI throws an error in the warm-up stage:

warmup{max_input_length=4096 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: Not enough memory to handle 4096 prefill tokens. You need to decrease --max-batch-prefill-tokens Error: Warmup(Generation("Not enough memory to handle 4096 prefill tokens. You need to decrease --max-batch-prefill-tokens"))

TheBloke
/

Llama-2-7B-Chat-GPTQ

TGI error