deploy the model on cloud machine

#11
by MazenSiraj - opened

Hello all, I tried downloading the model locally and after the download finished i tried to run the sample code and it showed an error related to offload folder path and I did not manage to solve it, actually I don't know what is that..
So, I'm trying to deploy the model on a virtual machine to have the suitable specs.. am using runpod
and i have this error on the 6th model download
ERROR text_generation launcher: An error occurred while downloading using hf_transfer. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

can any one help with any of the issues with how to use it locally, step-by-step guide for the regular level laptops or the steps to deploy on cloud and use it with apis

thanks

Hi @MazenSiraj ,
The issue with offload folder can be solved by adding offload_folder='offload'
self.model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", offload_folder='offload', trust_remote_code=True)
i have submitted a pull request so that the model can be deployed on hface inference endpoint
https://huggingface.co/inception-mbzuai/jais-13b-chat/discussions/12

while PR is being reviewed you can check out my copy of this model which already has those changes - see button deploy in top right corner
please note that you will need a beefy machine to run it, i was able to run it on GPU [large] · 4x Nvidia Tesla T4 which is $ 4.50 per h, small and medium size machines were not able to run it
https://huggingface.co/poiccard/jais-13b-chat-adn

Hi @poiccard ,
Thank u so much, I will check it. may I ask u, I tried to run it on my machine, it ran but every time I run the sample code it downloads again?
if you could support with the steps to run the model and use it, will be helpful.

thanks

Jais.png
@poiccard this is what i get every time I run the sample code and it starts downloading all over again, I don't think this is how it should go, correct?

Hi,
how did you clone it, make you have actually downloaded the bin files not just reference
git lfs install git clone https://huggingface.co/inception-mbzuai/jais-13b-chat
this model is big and is divided into pieces (shards) - what it tries to do next, is to load those shards into memory (so it is not downloading, but loading)

you can check more here
https://huggingface.co/docs/accelerate/v0.19.0/en/usage_guides/big_modeling

i was not able to launch this model on my machine, but i got in contact with model creators, and inshallah we will be working on improvements

in meantime as i mentioned previously, you can deploy my version of the model on huggingface inference endpoint (4.5 usd per hour - you can put it to sleep when you don't need it)

Sign up or log in to comment