Speed up

#11
by cute69 - opened

Hello Im very new to LLMS and Gemma I have saved the model locally and running prompts on it, I want to run it using my gpu and not my cpu.
and is there a way to speed up the process.
Thank you and sorry if its a bad question.

Hello Im very new to LLMS and Gemma I have saved the model locally and running prompts on it, I want to run it using my gpu and not my cpu.
and is there a way to speed up the process.
Thank you and sorry if its a bad question.

You need to install a frontend, like LM Studio, or llama.cpp or Koboldai 1.7
I use them, but for anything else, use Open WebUI, it's perfection when you want text to speech, talk with your models hands free, RAG, Long term memory, Internet Access, and so many features would take a while only to list. 🙏💥❤️

I did use llama.cpp but the response generation is kinda slow
I want to make it faster
How do i do it
And thank you!

Sign up or log in to comment