Colab demo doesn't use GPU

by aaronshenhao - opened Jan 10

Jan 10

I'm not really familiar with HF's transformer' library, so I'm not sure what's going on. But the inference demo is not using the T4 GPU at all, whereas it takes up 12 GB of system RAM, which is unexpected for such a small model. It's taking forever to complete. The demo also tries to load in the base model at the same time, which crashes Colab as it uses all the available system RAM.

Inference demo link: https://huggingface.co/venkycs/phi-2-instruct/blob/main/inference_phi_2_instruct.ipynb

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment