Max length 2048 error

by abhatia2 - opened Nov 23, 2023

Nov 23, 2023

Hey,
I am getting this error for large inputs
"{"error":"Input validation error: inputstokens +max_new_tokensmust be <= 2048. Given: 2037inputstokens and 400max_new_tokens","error_type":"validation"}"

Llama2 models have 4096 context length, is this something that can be configured during deployment?

abhatia2

Nov 23, 2023

•

edited Nov 23, 2023

By the way, I am getting this error for other quantized models too like TheBloke/Llama-2-70B-Chat-GPTQ

abhatia2

Nov 24, 2023

I was to resolve this by setting MAX_TOTAL_TOKENS parameters as mentioned in docs: https://huggingface.co/docs/text-generation-inference/basic_tutorials/launcher#maxtotaltokens

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment