TehVenom/MPT-7b-WizardLM_Uncensored-Storywriter-Merge

May 13, 2023

I tried doing it myself but ran into problems when using this: https://github.com/0cc4m/GPTQ-for-LLaMa (it adds support for mpt models)

ehartford

May 16, 2023

@TheBloke ?

RiggityWrckd

May 18, 2023

•

edited May 18, 2023

I was looking into this as well. I tried to use main GPTQ-for-llama to quant it (this model just sounds a million times more promising than the original) but I'm getting errors because it is not a llama model. I saw that like a week ago the Occam released a quanted version, so it is doable (https://huggingface.co/OccamRazor/mpt-7b-storywriter-4bit-128g). I just don't know how. I also looked through occam's github with his version of koboldai and originally just didn't see his GPTQ implementation.

Anyway, now that I see mpasila's link I'm going to try that route. I have data right now too so if it works I would be happy to upload a working model. Maybe thebloke will beat me to it hah

Edit: I tried every which way to make the GPTQ that was linked above work. Does anyone have the sauce. I even tried the gptneox which at least failed different way (cuda memory over run). When I tried to run with llama version it screws up every time talking about the tokenizer not being compatable with the neox style tokenizer.

I also tried installing the two different ways. The old way with the conda env and the new way by making a new conda env and then running the pip install git command they have listed on the repo. Couldn't get the pip install way to work at all.