Colab

#1
by Srrp - opened

How to run in colab t4?

I quantized the model down to 5bit weights: TobDeBer/Qwen2.5-Coder-32B-Q5_K_M-GGUF
Try running the gguf with "llama-cpp-python". I should fit on a T4 if you use the 16GB GPU VRAM plus CPU RAM offloading.
For a guide, just look at the inference section of this colab notebook:
https://colab.research.google.com/github/R3gm/InsightSolver-Colab/blob/main/LLM_Inference_with_llama_cpp_python__Llama_2_13b_chat.ipynb

Sign up or log in to comment