Control vector discussion

#2
by ChuckMcSneed - opened

Continuation of:
https://huggingface.co/jukofyork/Dark-Miqu-70B/discussions/3
I've succeeded in removing slop from CR+ for both sfw and nsfw scenarios using control vectors. Strangely, sfw unslop control vector did not affect nsfw slop, and nsfw control vector made model extra horny, which in my opinion is an undesirable side effect. While sfw vector managed to stay coherent during my stress tests, nsfw vector caused poor commandr to disintegrate, it didn't know what to say without any of those overused phrases in erotic fiction that the control vector stopped from appearing. Looks like the issue for nsfw is at much deeper level: the data where the model gets it from is very monotonous, and when forced write in different style, it doesn't know what to do. This is what most likely makes it incredibly difficult to remove nsfw slop using regular prompting techniques.

Well darn...

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

nsfw vector caused poor commandr to disintegrate, it didn't know what to say without any of those overused phrases in erotic fiction that the control vector stopped from appearing. Looks like the issue for nsfw is at much deeper level: the data where the model gets it from is very monotonous, and when forced write in different style, it doesn't know what to do.

This may actually just be a problem with the "two class" control vectors! I have managed to even completely stop a model from being able to write a story because of this... To explain the problem in simple terms:

Think about a clock face with a shorter hour hand and a longer minute hand:

  • When the time is 12:00 both hands point in the same direction, but there is still a gap between the tips of the two hands. These sort of vectors are not what we want at all because moving in either direction will just make the model more or less "storyish", and ultimately these are what cause the mode to get crippled like you describe. Even times like 12:05 or 11:50 have this same problem.
  • When the time is 6:00, 5:25, etc the the two hands point in opposite directions and this is a good control vector that clearly moves from undesirable to desirable direction.

This is the problem I'll been grappling with for the last 2 weeks:

  • If the "hands" are both long and well defined then cosine similarity works fine: it outputs a number similar to correlation and 1.0 is like the 12:00 example above and -1.0 is like the 6:00 example above (and 0.0 is like 3:00 or 9:00; ie: 90 degrees). This can then be used to filter out these shitty "storyish" directions, but...
  • There isn't really a good reason that the things we are interested in create a clear "axis" like this, and it turns out that often the case will be like a really long minute hand and a tiny/stubby hour hand... Cosine similarity doesn't work in this case as the direction of the tiny hand has noise added to it and can point in wildly different directions as a result.

So after lots of experimenting with this, I think I may finally have worked out a method of detecting these shitty directions:

Flip the direction of one of the hands and see if it gets easier to discriminate between our two classes!!!

  • If the time is 12:00 and you flip either hand to get 6:00 or 12:30 then it's clear the gap between the tips of the hands has increased! This is a shitty direction for a control vector.
  • If the time is 6:00 and you flip either hand then the gap has clearly decreased! This is a good direction for a control vector.
  • This works fine even when one hand is tiny in length.
  • This works for 12:05, 11:50 6:00, 5:25, type directions.
  • The like 3:00 or 9:00 type directions (ie: 90 degrees) are the directional pairs where we get no change.

So what I am doing now is performing SVD to decompose the gap into lots of directions, testing each one and only keeping those that pass the above test, then finally reconstructing the final direction to only include the "good" directions.

I still need to run some more tests but will likely have this perfected in a couple of days and will upload the new control vectors and the code to create your own.

Also @BigHuggyD @ChuckMcSneed you might find this interesting if you are using command-r models:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#66851beae526dd77799c25bd

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

I definitely think you need more samples though. PCA is basically just eigen-decomposition of a covariance matrix, and statistically it can be shown in the very best case you need O(d) samples to reliably estimate the covariance matrix:

https://stats.stackexchange.com/questions/90045/how-many-samples-are-needed-to-estimate-a-p-dimensional-covariance-matrix

and command-r-plus has around 11.5k variables in its hidden dimension and most other large 70b+ models have 8192 variables per hidden dimension.

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

A simple hacky test you could try would be to train your control vectors 5 times but leave one of the 5 prompts out each time. Then test and see if you get wildly different results... If you do then you need to increase the sample size, but if you don't then this must mean that only a tiny tiny fraction of command-r-plus's 11.5k variables are changing hugely in magnitude for your prompts (which would be very surprising).

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

Oh wow... That's real huge... Are all of those synthetic? I'm using high quality "cyborg" data: generated by model, but heavily edited by human(me) as positive, "mean" method; more time for me goes into dataset generation than into training. You know that the models have in-context learning, so my theory was that if I show it how to write(cyborg) vs how not to write(synthetic), I would get a better control vector out of it than when I just trhow it some starters with a prompt, and it seems to do just as I want. In the stories part, I try to keep as few variables from changing as possible, so they don't get affected by control vector. Also keeping the prompts equal length helps with the quality of the control vector, especially when they are short, >400token prompts can take 10 token variation much better than <100token prompts.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

Wait, you put that into positive too? It should be "Write a very sad story with a very bad ending about the day of your graduation." vs "Write a very happy story with a very good ending about the day of your graduation."

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

Oh wow... That's real huge... Are all of those synthetic? I'm using high quality "cyborg" data: generated by model, but heavily edited by human(me) as positive, "mean" method; more time for me goes into dataset generation than into training. You know that the models have in-context learning, so my theory was that if I show it how to write(cyborg) vs how not to write(synthetic), I would get a better control vector out of it than when I just trhow it some starters with a prompt, and it seems to do just as I want. In the stories part, I try to keep as few variables from changing as possible, so they don't get affected by control vector. Also keeping the prompts equal length helps with the quality of the control vector, especially when they are short, >400token prompts can take 10 token variation much better than <100token prompts.

I'm using a mix of different story prompt datasets I found and a set of 10 matched system prompts that go with these.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

Wait, you put that into positive too? It should be "Write a very sad story with a very bad ending about the day of your graduation." vs "Write a very happy story with a very good ending about the day of your graduation."

Even though the prompts are pretty trash; I think this might actually be quite a good thing and encourage the model to just generally "be dark" or "be chaotic" and not just when specifically asked to "write a grimdark story", etc.

It seem to have worked anyway, as the new control vectors are way better than the old ones from this repo.

I'm now also skipping the last layer (which it looks like you are also doing - from looking inside your .safetensors files?). The last layer seems to be an oddball and can have activations 10-100x larger than the pervious layer(s). The way I have the scale factors working now the early layers are fine to fiddle with and just get really tiny offsets added that do almost nothing if the direction is weak.

Later in the week I will investigate using the "Cross Correlation Matrix" again as now have a much better idea of how to test for the shitty "storyish" directions that killed this before.

I'm also gonna think what other traits I can try - "purple prose" isn't really something I encounter as mostly just try to get them to write "dark" stories and my main enemy is redemption arcs and stupid "steeled themselves for the challenges to come" BS.

@jukofyork
Thanks for the distributing training info earlier.

increase the Entropy much more was causing the models to get dumb and not follow instructions

I remember when people were merging Nemo finetunes and having issues with instruction following, someone did a quick finetune with Claude synthetic data:

https://huggingface.co/Undi95/LocalC-12B-e2.0

Maybe adding a sample of structured instruction following could help with what you're doing.

@BigHuggyD Are you exl2-quanting the new Mistral-Large? (mistralai/Mistral-Large-Instruct-2411) :)

I have very bad news about new Largestral:
image.png
Top 10 name tokens are 83% now, that's a new overfitting record. Top 5 are 76%. Fuck... I'll test it more tomorrow.

@jukofyork
Thanks for the distributing training info earlier.

increase the Entropy much more was causing the models to get dumb and not follow instructions

I remember when people were merging Nemo finetunes and having issues with instruction following, someone did a quick finetune with Claude synthetic data:

https://huggingface.co/Undi95/LocalC-12B-e2.0

Maybe adding a sample of structured instruction following could help with what you're doing.

@BigHuggyD Are you exl2-quanting the new Mistral-Large? (mistralai/Mistral-Large-Instruct-2411) :)

No, I'm almost certain I know the cause of this for my setup: I'm basically making the model think it's outputting a much sharper distribution (ie: like qwen-2.5 in @ChuckMcSneed image above), and then in turn the model is forced to recalibrate itself... BUT: if you push this too far, it will start to shrink the norm of the hidden states, so that the vector going into the final lm_head tensor is smaller as a way of "cheating" me not allowing it to alter the lm_head tensor and only the down_projtensors... This in turn screws up all the attention blocks as they are getting very different magnitude inputs to what they expect.

So the solution is to "fight back" by encouraging the model to make more orthogonal changes (ie: rotations instead of scaling).

Its taken all week to work out how to get this working in qlora-pipe but it's definitely working now, so I should be able to trick the models into thinking they are outputting even sharper distributions without this screwing the model up.

I have very bad news about new Largestral:
image.png
Top 10 name tokens are 83% now, that's a new overfitting record. Top 5 are 76%. Fuck... I'll test it more tomorrow.

I'm almost certain I can fix this sort of thing now to some degree, but I spent all last night downloading the fucking "consolidated tensors" from the repo lol.

I just noticed I did the same thing. "Why is my SSD full" lol

This model is better at following / sticking to instructions for non-writing purposes (didn't take the time to test that after seeing the images above)

I just noticed I did the same thing. "Why is my SSD full" lol

This model is better at following / sticking to instructions for non-writing purposes (didn't take the time to test that after seeing the images above)

Yeah, I saw they changed the prompt to include a system prompt now.

I think the worst thing will be if they start filtering out any pirated books due to fear of copyright claims, as that will be 100% impossible to fix (for us mere mortals anyway...).

I'm starting to use my dataset with all the occult and esoteric texts in now too, so hoping the name Entropy will start to go way up: Fiction books only have so many names and in the 8k context blocks used for fine-tuning they probably have 3-5 different names max... The occult books will have 100s in the same 8k context! Hopefully Lilith and Baphomet don't become the new Elara and Malachi though :D

@BigHuggyD Are you exl2-quanting the new Mistral-Large? (mistralai/Mistral-Large-Instruct-2411) :)

Been living on a rock these last 10 days (literally, vacationing in the mountains) had no clue LOL I'll heat up the oven, even though it doesn't sound like it is starting off promising...

Yeah, I saw they changed the prompt to include a system prompt now.

Sometimes it outright ignores it and goes to assistant persona when on low context. What was even the point of it if it doesn't work consistently? It also got more censored and less compliant than the 2407 version. So, a sidegrade instead of an upgrade?

Sign up or log in to comment