Using gguf format
This model was awesome. Is there any way that I could convert this into gguf format?
This model was awesome. Is there any way that I could convert this into gguf format?
From what i understand someone would need to extract the mmproj to quantize them separately as llamacpp requires two files for multi modality
GGUF format is coming soon
MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of llama.cpp for more detail.
and here is the MiniCPM-Llama3-V-2_5-gguf
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf
Thanks, this is awesome. Is there any plan to merge the update with the main llama.cpp repositories?
MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of llama.cpp for more detail.
and here is the MiniCPM-Llama3-V-2_5-gguf
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf
Thanks, this is awesome. Is there any plan to merge the update with the main llama.cpp repositories?
MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of llama.cpp for more detail.
and here is the MiniCPM-Llama3-V-2_5-gguf
https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf
Things like LM-Studio & KoboldCPP would benefit from a merge into main πΈ
@saishf
@Tomy99999
we are working on it. It may take some time because of the differences between minicpm-v2.0 and 2.5, that's hard to make it into one repo. And for now, we are working on Ollama.
If you guys have time for this PR, we'd appreciate it!
For reference I created a new thread for it: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/45
Right now minicpm is not working on llama.cpp (or any of it's wrappers), it will load and inference but the quality is below vanilla llava-1.5. Also the fork is not compatible anymore, ollama depends on the latest llama.cpp commits. Any work needs to be in llama.cpp to survive.
So at this point using any of the llava-1.5 or llava-1.6 versions is the best way to go on llama.cpp/ollama until support for the current SOTA models (Phi-v-128 and Minicpm-2.5) is available