Error when deploying sagemaker endpoint: Unsupported model type mixtral

#16
by harryneal - opened

I'm attempting to deploy to a Sagemaker endpoint in eu-west-1 and get the following error:

ValueError: Unsupported model type mixtral

I'm using the following set-up:

endpoint_name = "mixtral-8x7B-instruct-quantized"

# Hub Model configuration
hub = {
'HF_MODEL_ID':'mistralai/Mixtral-8x7B-Instruct-v0.1',
'SM_NUM_GPUS': json.dumps(1),
'HF_MODEL_QUANTIZE': "bitsandbytes", 
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
name=endpoint_name
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=600,
endpoint_name = endpoint_name
)

Any advice on this?

Also, for the record, is there any flexibility in the quantization settings in terms of type and bit length?

Hi @harryneal
Please see a similar issue here: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/13 and let us know if this helps

Hi @ybelkada ,
I believe these are two separate issues as I am attempting to deploy via an AWS Sagemaker endpoint using a HuggingFace container.

I believe the problem is that on huggingface llm image URI v1.1.0, the transformers version is 4.33.3, but according to the HF Mixtral blog it needs to be the latest version (4.36.0).

https://huggingface.co/blog/mixtral

Maybe this is a case of waiting a few days for the container to be updated but in the meantime if anyone can help out with creating a custom one myself it would be much appreciated!

Resolved - solution in link below, thanks Phil.

https://www.philschmid.de/sagemaker-deploy-mixtral

harryneal changed discussion status to closed

Thanks very much @harryneal !

@harryneal , the link https://www.philschmid.de/sagemaker-deploy-mixtral requires instance type of p4d.24xlarge. I am also trying to deploy using sagemaker.
Have you deployed with quantized version using smaller instance size?

Hi @Soraheart1988 , no I haven't yet managed to deploy Mixtral to a smaller instance, as the latest huggingface inference container doesn't support its quantization when deploying to a Sagemaker endpoint. I am hoping the legends at huggingface are working on it and will release a new update soon.

For reference this is the latest inference container:
https://github.com/aws/deep-learning-containers/releases/tag/v1.0-hf-tgi-1.3.3-pt-2.1.1-inf-gpu-py310

and this is a list of current and previous containers that I'm keeping an eye in case of an update:
https://github.com/aws/deep-learning-containers/releases?q=tgi&expanded=true

harryneal changed discussion status to open

Sign up or log in to comment