Error when deploying sagemaker endpoint: Unsupported model type mixtral

#16

by harryneal - opened Dec 12, 2023

Dec 12, 2023

I'm attempting to deploy to a Sagemaker endpoint in eu-west-1 and get the following error:

ValueError: Unsupported model type mixtral

I'm using the following set-up:

endpoint_name = "mixtral-8x7B-instruct-quantized"

# Hub Model configuration
hub = {
'HF_MODEL_ID':'mistralai/Mixtral-8x7B-Instruct-v0.1',
'SM_NUM_GPUS': json.dumps(1),
'HF_MODEL_QUANTIZE': "bitsandbytes", 
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
name=endpoint_name
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=600,
endpoint_name = endpoint_name
)

Any advice on this?

Also, for the record, is there any flexibility in the quantization settings in terms of type and bit length?

ybelkada

Dec 12, 2023

Hi @harryneal
Please see a similar issue here: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/13 and let us know if this helps

harryneal

Dec 12, 2023

Hi @ybelkada ,
I believe these are two separate issues as I am attempting to deploy via an AWS Sagemaker endpoint using a HuggingFace container.

harryneal

Dec 12, 2023

I believe the problem is that on huggingface llm image URI v1.1.0, the transformers version is 4.33.3, but according to the HF Mixtral blog it needs to be the latest version (4.36.0).

https://huggingface.co/blog/mixtral

Maybe this is a case of waiting a few days for the container to be updated but in the meantime if anyone can help out with creating a custom one myself it would be much appreciated!

harryneal

Dec 12, 2023

Resolved - solution in link below, thanks Phil.

https://www.philschmid.de/sagemaker-deploy-mixtral

harryneal changed discussion status to closed Dec 12, 2023

ybelkada

Dec 12, 2023

Thanks very much @harryneal !

Soraheart1988

Jan 2

@harryneal , the link https://www.philschmid.de/sagemaker-deploy-mixtral requires instance type of p4d.24xlarge. I am also trying to deploy using sagemaker.
Have you deployed with quantized version using smaller instance size?

harryneal

Jan 5

Hi @Soraheart1988 , no I haven't yet managed to deploy Mixtral to a smaller instance, as the latest huggingface inference container doesn't support its quantization when deploying to a Sagemaker endpoint. I am hoping the legends at huggingface are working on it and will release a new update soon.

For reference this is the latest inference container:
https://github.com/aws/deep-learning-containers/releases/tag/v1.0-hf-tgi-1.3.3-pt-2.1.1-inf-gpu-py310

and this is a list of current and previous containers that I'm keeping an eye in case of an update:
https://github.com/aws/deep-learning-containers/releases?q=tgi&expanded=true

harryneal changed discussion status to open Jan 5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment