Text Generation
Transformers
PyTorch
llama
text-generation-inference
Inference Endpoints

Token Limit Lower Than Base Model?

#15
by JamesConley - opened

I noticed inside the config that "max_position_embeddings": 2048,". The base 70b model has a 4096 token length (see https://huggingface.co/meta-llama/Llama-2-70b-chat-hf/blob/main/config.json).
Was this intentionally reduced? Additionally, the tokenizer is indicating an even lower token limit (see below)
Token indices sequence length is longer than the specified maximum sequence length for this model (2661 > 1500). Running this sequence through the model will result in indexing errors

JamesConley changed discussion title from Token Limit to Token Limit Lower Than Base Model?

Sign up or log in to comment