Running Llama3-8B-1.58-100B-tokens on CPU

#2
by chiauho - opened
Hugging Face 1Bit LLMs org

Hi, the example given on how to use the model still load and run it on GPU. How can I run these on CPU? Thanks for any pointers

Hugging Face 1Bit LLMs org

Hi, sorry for the late reply. It can run on cpu but it's slow due to the unpacking logic, so it's advisable to run it on gpu, but to run it on cpu just specify that in the device_map : device_map="cpu"

Hugging Face 1Bit LLMs org

Ok, thank you very much.

I was hoping that 1bit model like this will be able to run on cpu without gpu. Even run on ARM.

Hugging Face 1Bit LLMs org

If you are interested, check out this space, it uses bitnet.cpp to run the model on cpu, and it's much faster : https://huggingface.co/spaces/medmekk/BitNet.cpp

Sign up or log in to comment