TGI error
#5
by
aiamateur101
- opened
I encountered the same error as in https://github.com/huggingface/text-generation-inference/issues/601#issuecomment-1652866165.
TGI throws an error in the warm-up stage:
warmup{max_input_length=4096 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: Not enough memory to handle 4096 prefill tokens. You need to decrease --max-batch-prefill-tokens Error: Warmup(Generation("Not enough memory to handle 4096 prefill tokens. You need to decrease --max-batch-prefill-tokens"))