mlx-community/CodeLlama-70b-Instruct-hf-4bit-MLX · `std::runtime_error: [Matmul::eval

Jan 30

Hello,

I am getting this output on my Mac M1 Pro on Sonoma 14.3. Python version used is 3.11. Used latest PyTorch.

Please visit this link for full output: https://app.warp.dev/block/NMbYuCAkwfcxcQ7zjhZv8n

In [5]: response = generate(model, tokenizer, prompt="<step>Source: user Fibonacci series in Python<step> Source: assistant Destination: user", verbose=True)
   ...:
   ...:
==========
Prompt: <step>Source: user Fibonacci series in Python<step> Source: assistant Destination: user

libc++abi: terminating due to uncaught exception of type std::runtime_error: [Matmul::eval_cpu] Currently only supports float32.
[1]    9782 abort      ipython

ivanfioravanti

MLX Community org Jan 31

How much RAM in your M1 Pro? This model requires a machine with at least 64GB to run with q4

nxphi47

MLX Community org Feb 28

•

edited Feb 28

@ivanfioravanti I have the M1 Pro 16GB, but still got this error with a 7B Q4 quantized model.

Please see my github issue if you know how to fix it. Many thanks
https://github.com/ml-explore/mlx/issues/753

mlx-community
/

CodeLlama-70b-Instruct-hf-4bit-MLX

`std::runtime_error: [Matmul::eval_cpu] Currently only supports float32`