Text Generation
Transformers
PyTorch
Safetensors
English
llama
Eval Results
text-generation-inference
Inference Endpoints

GGUF Model

#7
by juanjgit - opened

I converted it to GGUF. It is the first time I do it so I might have done something wrong... but It is working fine for me in a 6Gb android phone.
https://huggingface.co/juanjgit/orca_mini_3B-GGUF

Wow 6GB Android phone, did you measure the speed of tokens generation? How slow/fast it is?

Good news is that I am working on releasing v2 , so you could be early one to make GGUF version :) stay tuned .

Your model is the only 3B that is usable, it gives pretty good responses. And when it hallucinates, it is funny. So a v2 is very good news!
I compiled llama.cpp in Termux and I am getting 1.5-2 tokes/s.

Sign up or log in to comment