The Serverless Inference API: "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)"

#31

by michaelpope - opened Apr 19

Apr 19

It shows the error "The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB)" when using the Serverless Inference API.

Any way to use Meta-Llama-3-8B with the Serverless Inference API?

Thank you!

Ironmole

Apr 21

Same question here!

JadeTW

Apr 23

This comment has been hidden

gbhall

Apr 23

It's ironic because the error is The model meta-llama/Meta-Llama-3-8B is too large to be loaded automatically (16GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints). but I am using inference endpoints?

gbhall

Apr 23

•

edited Apr 23

Got it working, on the website on the right hand column is specifically says

Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

After creating a dedicated endpoint it works.

dragonku

Apr 27

The same error message.

Want to use Meta-Llama-3-8B with the Serverless Inference API.

JitendraK

May 9

same problem even in pro account

osanseviero

May 9

Hey all. This model is not provided in the serverless inference API, but the instruct version is https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

JitendraK

May 9

for the beginners : what's the difference between regular and Instruct model?

osanseviero

May 9

Base models are optimize to generate the next token. If you want a chat-like model (a-la ChatGPT), you want to use an instruct version, which is the base model furtherly trained on chat-like behavior (with a series of alignment techniques).

osanseviero changed discussion status to closed May 9

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment