Colab

by Srrp - opened about 5 hours ago

Srrp

about 5 hours ago

How to run in colab t4?

about 3 hours ago

I quantized the model down to 5bit weights: TobDeBer/Qwen2.5-Coder-32B-Q5_K_M-GGUF
Try running the gguf with "llama-cpp-python". I should fit on a T4 if you use the 16GB GPU VRAM plus CPU RAM offloading.
For a guide, just look at the inference section of this colab notebook:
https://colab.research.google.com/github/R3gm/InsightSolver-Colab/blob/main/LLM_Inference_with_llama_cpp_python__Llama_2_13b_chat.ipynb

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment