The Serverless Inference API: "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)"
It shows the error "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)" when using the Serverless Inference API.
Any way to use Meta-Llama-3-8B with the Serverless Inference API?
Thank you!
Same question here!
It's ironic because the error is The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).
but I am using inference endpoints?
Got it working, on the website on the right hand column is specifically says
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.
After creating a dedicated endpoint it works.
The same error message.
Want to use Meta-Llama-3-8B with the Serverless Inference API.
same problem even in pro account
Hey all. This model is not provided in the serverless inference API, but the instruct version is https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
for the beginners : what's the difference between regular and Instruct model?
Base models are optimize to generate the next token. If you want a chat-like model (a-la ChatGPT), you want to use an instruct version, which is the base model furtherly trained on chat-like behavior (with a series of alignment techniques).