Has anyone tried fine tuning this model directly?
I am trying to fine tune this model on a simple task using deepspeed zero 3, but am unable to overfit even to 10 examples. It feels like a BUG where it mostly generates meaningless predictions. I tried a wide variety of learning rates.
Has anyone successfully used this model in fine tuning? I can share the config and setup in a bit but perhaps my question is resolveable/discussable without that. cc
@misberner
Thanks!
Rajhans
Rajhans- this is a quantized version of the MPT-7b model that's used in our fine-tuning loop with the Gretel service. It should work fine, assuming you're using the same/similar training parameters to what we're using in our docs https://docs.gretel.ai/reference/synthetics/models/gretel-gpt
One more thing- as it's a base version of the model and not chat, this model is mostly useful if you're fine-tuning with a significant amount of data- probably minimum of several thousand examples. Cheers