[Cache Request] aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-2

by RamiroRamirez - opened Mar 4

Mar 4

Please add the following model to the neuron cache

RamiroRamirez changed discussion title from [Cache Request] aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-1 to [Cache Request] aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-2 Mar 4

dacorvo

AWS Inferentia and Trainium org Mar 21

The model is cached for versions >= 0.0.20 (sequence length is 4096)

dacorvo changed discussion status to closed Mar 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment