Issue with using the model in Spaces
from huggingface_hub import InferenceClient
client = InferenceClient(
"ehartford/dolphin-2.5-mixtral-8x7b"
)
When I try to use ehartford/dolphin-2.5-mixtral-8x7b in Spaces, I get these errors:
huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b (Request ID: bttrYLuVoD5jjxUm9RxFm)
The model ehartford/dolphin-2.5-mixtral-8x7b is too large to be loaded automatically (93GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b
Is the API access of this model restricted, or its too large? But when I tried mistralai/Mixtral-8x7B-Instruct-v0.1 using InferenceClient, it works.
You may be launching it on spaces, but u are still using the InferenceClient that uses the inference api. I guess you must be new around, models that require more than 10gb cannot be run with the free inference API, and running it on spaces using a free space will not be possible neither hardware speaking.
If you ever want to REALLY run the model itself, in a space or in your computer, you will not be using the inference API and instead be using transformers, AKA downloading the model on your computer and running it on your own hardware (or space hardware if you run it here)
I help I managed to be of some help !
from huggingface_hub import InferenceClient client = InferenceClient( "ehartford/dolphin-2.5-mixtral-8x7b" )
When I try to use ehartford/dolphin-2.5-mixtral-8x7b in Spaces, I get these errors:
huggingface_hub.utils._errors.HfHubHTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b (Request ID: bttrYLuVoD5jjxUm9RxFm) The model ehartford/dolphin-2.5-mixtral-8x7b is too large to be loaded automatically (93GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints). requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api-inference.huggingface.co/models/ehartford/dolphin-2.5-mixtral-8x7b
Is the API access of this model restricted, or its too large? But when I tried mistralai/Mixtral-8x7B-Instruct-v0.1 using InferenceClient, it works.
Now about Mixtral I agree that I was surprised too, but Mixtral does use originally an architecture quite unique that reduces considerably the amount of paremeters required to predict tokens for max efficiency, so that may be the reason.
Thank you for helping me!