Inference API (serverless)

#7
by vbaldinger - opened

Hi all!

According to the model card, this model can be loaded on Inference API (serverless). But if I try do to so I get the error:
The model Groq/Llama-3-Groq-8B-Tool-Use is too large to be loaded automatically (16GB > 10GB)

Groq org

Is this not just a limitation of the free tier?

I have the pro subscription and can use e.g. meta-llama/Meta-Llama-3.1-405B-Instruct-FP8, so I don't think so

Groq org

meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

This is sub 10GB? :O

You might need to save the model as smaller safetensors.

Sign up or log in to comment