running it on cpu using pretrained
#35
by
himanshuyadav62
- opened
from transformers import AutoTokenizer, AutoModelForCausalLM
can we use this to run model only on cpu
Yes, you can run the smaller Gemma models on CPU. Please make sure not to select the 'device_map' to GPU explicitly to run the model on CPU. You can also use the quantized version of the model to utilize the less memory. Please have a look at the gist for your reference where I run the Gemma2-2b-it model by selecting the CPU only in Google Colab.