Meta-Llama-3.1-8B-Instruct deployment on AWS Sagemaker fails

#61
by Keertiraj - opened

I followed the instructions on Hugging Face to deploy the 'meta-llama/Meta-Llama-3.1-8B-Instruct' model on AWS Sagemaker. Here is the error log:

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
#033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m#033[2m2024-07-29T11:32:06.411916Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start

Error: ShardCannotStart

The code snippet:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

Hub Model configuration. https://huggingface.co/models

hub = {
'HF_MODEL_ID':'meta-llama/Meta-Llama-3.1-8B-Instruct',
'SM_NUM_GPUS': json.dumps(1),
'HUGGING_FACE_HUB_TOKEN': ''
}

assert hub['HUGGING_FACE_HUB_TOKEN'] != '', "You have to provide a token."

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="2.0.2"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

send request

predictor.predict({
"inputs": "Hey my name is Julien! How are you?",
})

The only code change I have done to what has been mentioned in the Hugging Face Deployment guide is this line:

image_uri=get_huggingface_llm_image_uri("huggingface",version="2.0.2") // I have used 2.0.2 instead of 2.2.0 as AWS JupyterLab in the Sagemaker Studio doesn't support HugginFace version 2.2.0 yet.

Can someone share the answer to this issue (if you have already resolved it)? Thank you

Hey did you ever figure this out? Please do let me know

Version 2.2.0 of the llm_image should work now.

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'meta-llama/Meta-Llama-3.1-8B',
    'SM_NUM_GPUS': json.dumps(1),
    'HUGGING_FACE_HUB_TOKEN': '<REPLACE WITH YOUR TOKEN>'
}

assert hub['HUGGING_FACE_HUB_TOKEN'] != '<REPLACE WITH YOUR TOKEN>', "You have to provide a token."

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface",version="2.2.0"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=300,
  )
  
# send request
predictor.predict({
    "inputs": "My name is Julien and I like to",
})

@adaryan yes, the above code should work now. If you still face any issues, share the code snippet here. I will look into it

Sign up or log in to comment