[Cache Request] aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-2

#9
by RamiroRamirez - opened

Please add the following model to the neuron cache

RamiroRamirez changed discussion title from [Cache Request] aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-1 to [Cache Request] aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-2
AWS Inferentia and Trainium org

The model is cached for versions >= 0.0.20 (sequence length is 4096)

dacorvo changed discussion status to closed

Sign up or log in to comment