cognitivecomputations/WizardLM-7B-Uncensored · Issue with deploy on sagemaker -

May 9, 2023

•

edited May 11, 2023

Hi, I am trying to deploy on sagemaker and am running into some issues I don't get on other models

from sagemaker.huggingface import HuggingFaceModel
import boto3
from sagemaker import Session

# Replace with your access key and secret key
access_key = "key"
secret_key = "key"

# Create a boto3 session with the specified access key and secret key
boto3_session = boto3.Session(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    region_name="us-east-1"
)

# Use the boto3 session to create the IAM client
iam_client = boto3_session.client('iam')

# Create a SageMaker session with the custom boto3 session
sagemaker_session = Session(boto_session=boto3_session)

role = iam_client.get_role(RoleName='ROLE')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'TheBloke/WizardLM-7B-uncensored-GPTQ',
    'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    py_version='py38',
    env=hub,
    role=role,
    sagemaker_session=sagemaker_session  # Pass the custom SageMaker session
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g4dn.2xlarge' # ec2 instance type
)

I am getting the following error trying to query the endpoint after deployment:

{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

Is this a library that it doesn't import? Do I need to custom set this up instead of just deploying to sagemaker? The inference huggingface export doesn't work for the same reason, probably worth bringing to your attention.

Thank you

magicsquares137

May 11, 2023

I am getting the same error, any suggestions? Im simply using the sagemaker deployment code listed above

ehartford

Cognitive Computations org May 11, 2023

•

edited May 11, 2023

I'm afraid I don't know anything about sagemaker. But I'm happy to take pull requests if anyone figures out what's wrong

magicsquares137

May 18, 2023

I figured out the error, unfortunately dont see an immediate solution to deploy this as a sagemaker endpoint. The sagemaker env only supports HF transformers versions up to 4.7 or something, and this model is a fine tuned llama model, which was done on 4.28: https://huggingface.co/decapoda-research/llama-7b-hf/discussions/39

not sure when support will be available