Anybody know how/what can actually load/inference this model?

#50
by SytanSD - opened

I have tried the following: Ooobabooga TextGenWebui, llama.cpp, ollama.cpp, kobold.cpp, Tabby, Exlv2, and LM Studio, and not a single one of them has support for this model. I am trying to use this model as an open source, locally run alternative to GPT-4, as I do not like or wish to support OpenAI in any way possible, but it seems as though this model is just designed in a way that means running it on any pre-existing GUI is impossible

Any additional info would be massively appreciated, as I am having to put my job on hold to try and sort out a GPT-4 alternative.

Important to note: I have absolutely 0 experience with Diffusers/Transformers, and I have very little experience with code as well. I am trying to find a solution that allows me to run this model in a way that I can direct a front end to its port and have it fulfill requests from a tagging/captioning GUI

Maybe you can use lmdeploy which support MiniCPM-Llama3-V-2_5, you can use command line to get gradio, api_server or chat at terminal

https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5 - You can run llama.cpp server from here I believe

Maybe you can use lmdeploy which support MiniCPM-Llama3-V-2_5, you can use command line to get gradio, api_server or chat at terminal

I tried this specifically today after a recommendation from a colleague. It seems much more straight forward than whatever mess I had going on, but unfortunately lmdeploy bloats the model to unusable size. Inferencing with the web server provided by openbmb does work just fine in 24GB VRAM even at FP16 (Does not offer the functionality I need unfortunately), but loading in lmdeploy causes it to balloon the model to a bloated and unusable 27GB VRAM in an effort to convert it to a turbomind file format. Ironically, this conversion is supposed to make the inference faster, but by making the model so overly obese, it takes my inference from 2-3 seconds per image to over 6 minutes.

Additionally, I tried to load the int4 version of the model in lmdeploy, only for it to not actually support it... So I really don't have any options that suffice with lmdeploy, as much as I really wish I could use it

I once encountered the same problem as you, but it was solved by downloading the gguf file. You can give it a try.

Sign up or log in to comment