Neuronx model for meta-llama/Llama-2-7b-hf
This repository contains AWS Inferentia2 and neuronx
compatible checkpoints for meta-llama/Llama-2-7b-hf.
You can find detailed information about the base model on its Model Card.
This model has been exported to the neuron
format using specific input_shapes
and compiler
parameters detailed in the paragraphs below.
Please refer to the π€ optimum-neuron
documentation for an explanation of these parameters.
Usage on Amazon SageMaker
coming soon
Usage with π€ optimum-neuron
>>> from optimum.neuron import pipeline
>>> p = pipeline('text-generation', 'aws-neuron/Llama-2-7b-hf-neuron-latency')
>>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
[{'generated_text': 'My favorite place on earth is the ocean. It is where I feel most
at peace. I love to travel and see new places. I have a'}]
This repository contains tags specific to versions of neuronx
. When using with π€ optimum-neuron
, use the repo revision specific to the version of neuronx
you are using, to load the right serialized checkpoints.
Arguments passed during export
input_shapes
{
"batch_size": 1,
"sequence_length": 2048,
}
compiler_args
{
"auto_cast_type": "fp16",
"num_cores": 24,
}
- Downloads last month
- 9