Has anyone tried fine tuning this model directly?

by rajhans-snowflake - opened Aug 10, 2023

Aug 10, 2023

I am trying to fine tune this model on a simple task using deepspeed zero 3, but am unable to overfit even to 10 examples. It feels like a BUG where it mostly generates meaningless predictions. I tried a wide variety of learning rates.
Has anyone successfully used this model in fine tuning? I can share the config and setup in a bit but perhaps my question is resolveable/discussable without that. cc @misberner
Thanks!
Rajhans

meowterspace42

Gretel.ai org Sep 15, 2023

Rajhans- this is a quantized version of the MPT-7b model that's used in our fine-tuning loop with the Gretel service. It should work fine, assuming you're using the same/similar training parameters to what we're using in our docs https://docs.gretel.ai/reference/synthetics/models/gretel-gpt

meowterspace42 changed discussion status to closed Sep 15, 2023

meowterspace42

Gretel.ai org Sep 15, 2023

One more thing- as it's a base version of the model and not chat, this model is mostly useful if you're fine-tuning with a significant amount of data- probably minimum of several thousand examples. Cheers

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment