Max length 2048 error
#5
by
abhatia2
- opened
Hey,
I am getting this error for large inputs"{"error":"Input validation error:
inputstokens +
max_new_tokensmust be <= 2048. Given: 2037
inputstokens and 400
max_new_tokens","error_type":"validation"}"
Llama2 models have 4096 context length, is this something that can be configured during deployment?
By the way, I am getting this error for other quantized models too like TheBloke/Llama-2-70B-Chat-GPTQ
I was to resolve this by setting MAX_TOTAL_TOKENS parameters as mentioned in docs: https://huggingface.co/docs/text-generation-inference/basic_tutorials/launcher#maxtotaltokens