Colab
#1
by
Srrp
- opened
How to run in colab t4?
I quantized the model down to 5bit weights: TobDeBer/Qwen2.5-Coder-32B-Q5_K_M-GGUF
Try running the gguf with "llama-cpp-python". I should fit on a T4 if you use the 16GB GPU VRAM plus CPU RAM offloading.
For a guide, just look at the inference section of this colab notebook:
https://colab.research.google.com/github/R3gm/InsightSolver-Colab/blob/main/LLM_Inference_with_llama_cpp_python__Llama_2_13b_chat.ipynb