What does the T parameter do for slerp merges?

#2
by lemon07r - opened

Was wondering what the T parameter does for slerp merges.

Hi, if I understand correctly what is written on the mergekit website, t is responsible for which model will influence the parameters of the new model more.

SLERP
Spherically interpolate the parameters of two models. One must be set as base_model.
Parameters:
t - interpolation factor. At t=0 will return base_model, at t=1 will return the other one.

For example, if t is closer to 0, then the new model will be closer to the base model, and if it is closer to 1, then it will be more like the second model.

models:
  - model: recoilme/recoilme-gemma-2-9B-v0.4
  - model: lemon07r/Gemma-2-Ataraxy-v2-9B
merge_method: slerp
base_model: recoilme/recoilme-gemma-2-9B-v0.4
dtype: bfloat16
parameters:
  t: 0.25

For example in this case the new model is closer to recoilme/recoilme-gemma-2-9B-v0.4.

I've recently started looking into this, so if it's not I'd love to hear how it works.

Hi, if I understand correctly what is written on the mergekit website, t is responsible for which model will influence the parameters of the new model more.

SLERP
Spherically interpolate the parameters of two models. One must be set as base_model.
Parameters:
t - interpolation factor. At t=0 will return base_model, at t=1 will return the other one.

For example, if t is closer to 0, then the new model will be closer to the base model, and if it is closer to 1, then it will be more like the second model.

models:
  - model: recoilme/recoilme-gemma-2-9B-v0.4
  - model: lemon07r/Gemma-2-Ataraxy-v2-9B
merge_method: slerp
base_model: recoilme/recoilme-gemma-2-9B-v0.4
dtype: bfloat16
parameters:
  t: 0.25

For example in this case the new model is closer to recoilme/recoilme-gemma-2-9B-v0.4.

I've recently started looking into this, so if it's not I'd love to hear how it works.

That's really neat. Wish I knew about this sooner tbh. Would have helped with my other merges a lot. This also means I have way more merges to try now..

On another note, I didn't get very good results with Ataraxy v2. I think the ifable model it's merged with is a little "heavy", or hard to work with. It's ended up more like a sidegrade to v1.

There's also stock merge.. which has given me good benchmark results but feels pretty awful in actual use. I've been trying to reread the paper to get a handle on how it should be best used but I still have no idea.

There's also delta merge. I saw someone call it the SOTA way, but I'm not sure what makes it SOTA. They did however get fantastic results with Gemma advanced 2.1. Personally I've gotten pretty good results with it too in a few experimental models BUT they score very poorly in benchmarks. I've not been able to find any papers or anything on this merge method though.

Sleep merges for the most part has been the best all around method for me so far.

As far as finetunes go, talking with a few finetuners I've learned online policy > offline policy for training. Which means as far as preference optimization goes, sppo should be the best, followed by wpo, than simpo being the most inferior. At least on paper. Gutenberg is Gutenberg, nbeerbower has made a new Gutenberg dataset so it'll be interesting to see how this effects finetuning. he trains his Gemma 9b on Gutenberg using the sppo model so it's quite good, and his new one using both Gutenberg datasets scores like 29 on openllm as well, very impressive model. I've been trying to be greedy and trying to find a merge recipe that can take it a step further but haven't figured out something that works well yet.

I'm curious btw, what's the makeup of your recoilme models? Roughly. Don't need to give me the exact recipe, just curious what main models are apart of its DNA

On the last question, if I understand correctly you are interested in how the recoilme (recoilme/recoilme-gemma-2-9B-v0.4) model is organized, if so, there might be a misunderstanding because I probably don't name the models correctly. If that's the case, then the recoilme model would be better to ask its creator. I would also be interested in the best way to name the models.

On the last question, if I understand correctly you are interested in how the recoilme (recoilme/recoilme-gemma-2-9B-v0.4) model is organized, if so, there might be a misunderstanding because I probably don't name the models correctly. If that's the case, then the recoilme model would be better to ask its creator. I would also be interested in the best way to name the models.

Oh youre right. Didnt realize it wasnt the same person. @recoilme

t - 0.25 come not from my merge, idnk
also recoilme/recoilme-gemma-2-9B-v0.4 - has best score on eval but it censored

i use https://huggingface.co/recoilme/Gemma-2-Ataraxy-Gemmasutra-9B-slerp this model in my bot (telegram) - charsaibot - it uncensored

As I understand it, @lemon07r wanted to know how the recoilme/recoilme-gemma-2-9B-v0.4 model is structured, or am I mistaken?

That's correct, I wanted to know the recipe/parent models for your models and how they were made @recoilme

Sign up or log in to comment