Issue with deploy on sagemaker -
Hi, I am trying to deploy on sagemaker and am running into some issues I don't get on other models
from sagemaker.huggingface import HuggingFaceModel
import boto3
from sagemaker import Session
# Replace with your access key and secret key
access_key = "key"
secret_key = "key"
# Create a boto3 session with the specified access key and secret key
boto3_session = boto3.Session(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name="us-east-1"
)
# Use the boto3 session to create the IAM client
iam_client = boto3_session.client('iam')
# Create a SageMaker session with the custom boto3 session
sagemaker_session = Session(boto_session=boto3_session)
role = iam_client.get_role(RoleName='ROLE')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'TheBloke/WizardLM-7B-uncensored-GPTQ',
'HF_TASK':'text-generation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
env=hub,
role=role,
sagemaker_session=sagemaker_session # Pass the custom SageMaker session
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.g4dn.2xlarge' # ec2 instance type
)
I am getting the following error trying to query the endpoint after deployment:
{
"code": 400,
"type": "InternalServerException",
"message": "\u0027llama\u0027"
}
Is this a library that it doesn't import? Do I need to custom set this up instead of just deploying to sagemaker? The inference huggingface export doesn't work for the same reason, probably worth bringing to your attention.
Thank you
I am getting the same error, any suggestions? Im simply using the sagemaker deployment code listed above
I'm afraid I don't know anything about sagemaker. But I'm happy to take pull requests if anyone figures out what's wrong
I figured out the error, unfortunately dont see an immediate solution to deploy this as a sagemaker endpoint. The sagemaker env only supports HF transformers versions up to 4.7 or something, and this model is a fine tuned llama model, which was done on 4.28: https://huggingface.co/decapoda-research/llama-7b-hf/discussions/39
not sure when support will be available