Error in deployment in sagemaker
Facing the same issue Error: ShardCannotStart while deploying CodeLlama via hugging face
Hey, you need the mainline version of the 🤗 transformers from git to run this (https://huggingface.co/codellama/CodeLlama-7b-hf#model-use), there's no container for it yet on sagemaker (I guess you both are using the 0.9.3 container), you'll have to run it outside of sagemaker or load it on a notebook instance directly (that's what I'm doing for now, until this is supported)
I just made the notebook instance and inside that i have created a jupyter notebook and ran this code…can you please elaborate how to deploy.
That's about as far as I've got, I'm following the documentation here: https://huggingface.co/docs/transformers/main/model_doc/code_llama
plus the pip install from git on the readme from this model, then just use the notebook to play with it, as said there's no easy way to deploy it as an actual interference endpoint (you could build your own container with the required versions though), good luck!
PS: you can use thebloke's gptq build and run it on multi GPU if you pip install auto-gptq optimum