Inference API not working

#2
by Purronika - opened

Hi there,
despite having a HuggingFace pro subscription and having been successfully granted access to the llama2 repository (https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), it seems that using the inference API does not work for me. (See pic attached).

Where's my mistake or what am I doing wrong?
Thanks a lot for any suggestions and your help!

image.png

Hi Veronika,
Did you try the Interface API in the Llama2 model? If you also encounter the same error, it means your gated model access request is still in progress. I guess it may take a couple of days to get approval by Meta.

Hi Edward,
thanks so much for the reply! Yes, I did try the Interface API and everything works (see pic attached). I was granted access several weeks ago already and this was the first thing I tried to solve / check the error. It seems that the MedQuaD does not recognize the token somehow; I encountered the same problem when setting up the model in Colab.
Is anyone else having the same issue?

Do you have any other suggestions to get this resolved?

image.png

Hi Veronika,
Honestly, I've no idea why you can't access the model. Since you have access to the llama2 model, you should be able to access llama-2-7b-MedQuAD, because it's a public adapter of llama-2-7b-chat-7f.
If you would like to load the model on Colab or local machine, I do have a couple of alternatives.
Alternative 1. (use the merged model): the llama-2-7b-MedQuAD-merged model is a merged version (merge base model and adapter), and doesn't require llama2 access.
Alternative 2. (clone the model (adapter) files to local): run these commands in Colab

# Install LFS (git Large File Storage), if it's not installed.
# For ubuntu, run the command below. (see https://git-lfs.com/ for other platforms)
! apt-get install git-lfs
! git lfs install

# clone model files
! git clone https://huggingface.co/EdwardYu/llama-2-7b-MedQuAD

# replace the path of adapter with the local path you downloaded just now
adapter = './llama-2-7b-MedQuAD'

Hopefully, this helps.

Hi Edward,
thanks so much for helping out and offering the alternatives! I can confirm that using the merged model worked and I can continue working with this.
Really appreciated your support.

Hi Edward, I'm currently doing something quite similar and would love to get your input on it.

I've trained a version of Medquad and pushed the model to the hub, its adapter_model.safetensor is in my repo but no model.safetensor file, I'm curious as to how pulling this model down gets the full model?

Hi Christian,

I assume your adapter is fine-tuned from the HuggingFace pretrained model.
If so, You can save your model locally or push to HuggingFace using the bult-in function, e.g.,

# save in local disk (optional)
model.save_pretrained('my_local_path')`

# share your model (push to HuggingFace)
model.push_to_hub('SilveriteKey/MedQuadQuantized')

Once the fine-tuned model is pushed to HuggingFace, you should be able to see you adapter model in SilveriteKey/MedQuadQuantized.

Sign up or log in to comment