Inference API (serverless)
#7
by
vbaldinger
- opened
Hi all!
According to the model card, this model can be loaded on Inference API (serverless). But if I try do to so I get the error:
The model Groq/Llama-3-Groq-8B-Tool-Use is too large to be loaded automatically (16GB > 10GB)
Is this not just a limitation of the free tier?
I have the pro subscription and can use e.g. meta-llama/Meta-Llama-3.1-405B-Instruct-FP8, so I don't think so
meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
This is sub 10GB? :O
You might need to save the model as smaller safetensors.