Deployment failing on Sagemaker
I don't know what's wrong here but the deployment is failing on sagemaker:
def deploy_mistral(
):
hub = {
"HF_MODEL_ID": "mistralai/Mixtral-8x7B-v0.1",
"SM_NUM_GPUS": json.dumps(8),
"DTYPE": "bfloat16"
}
hf_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface"),
transformers_version='4.36.0',
env=hub,
name="mistral-model",
role=get_iam_role(),
)
predictor = hf_model.deploy(
container_startup_health_check_timeout=300,
initial_instance_count=1,
instance_type="ml.p4d.24xlarge",
endpoint_name="mistral",
)
deploy_mistral()
#033[2m2023-12-12T17:48:43.576706Z#033[0m #033[31mERROR#033[0m #033[1mshard-manager#033[0m: #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard complete standard error output:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 83, in serve
server.serve(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 207, in serve
asyncio.run(
File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 336, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type mixtral
#033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m
#033[2m2023-12-12T17:48:43.674324Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start
#033[2m2023-12-12T17:48:43.674344Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shutting down shards
Error: ShardCannotStart
@vibranium
upgrade your transformers pip install --upgrade git+https://github.com/huggingface/transformers.git --no-cache
@Nondzu This is sagemaker deployment.
from sagemaker.huggingface import HuggingFaceModel
and i already have this defined in hub:
transformers_version='4.36.0',
Did I miss something?
I face the same issue and also installed newest transformers from source. transformers commit id is f4db565b695582891e43a5e042e5d318e28f20b8
Could you provide the help?
Hey can you please take a look at https://www.philschmid.de/sagemaker-deploy-mixtral. You need the container version 1.3.1 which is not yet available in sagemaker.
@philschmid Thanks! it worked. Much appreciated for the help.
@philschmid I tried following your tutorial but I keep getting the same issue as @vibranium . Any ideas as to what the issues might be?
@seabasshn What instance size you are using? In my case it works on ml.g5.48xlarge. Also, make sure you are using below image:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04-v1.0
You would also need sagemaker version: sagemaker==2.199.0
@vibranium Yes, I am using ml.g5.48xlarge and image: 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04-v1.0
@philschmid
- I have the same problem trying to deploy, exactly following the instructions in your blog post ( same image, same instance, etc ), from Sagemaker Studio (exactly as
@seabasshn
seems to experience too). The Cloud logs state a problem starting the Shard.
Any ideas what might be the problem??
Thanks!!!
@philschmid
,
@seabasshn
: problem solved: I needed TGI v1.3.3
i.e. huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.3-gpu-py310-cu121-ubuntu20.04-v1.0
NOT v1.3.1 as described in the demo.
Then it worked
@philschmid , are there quantized versions of the Mixtral-8x7B-v0.1 model available yet over the Hugging Face LLM DLC?
it worked also for me with the TGI v1.3.3
Nice blog post
@philschmid
, very neat!