@grimjim on Hugging Face: "I've come across theoretical justification for my prior experimentation with…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

grimjim

posted an update Aug 3

Post

4175

I've come across theoretical justification for my prior experimentation with extremely low-weight mergers: they amount to flattening a model so its "massive activation" features remain as significant contributors. Extremely low-weight merge weights also effectively sparsify a contributing model with regard to the base model, but in a way which still preserves relationships within the flattened latent space. In the paper "Massive Activations in Large Language Models", the authors observed "very few activations exhibit significantly larger values than others (e.g., 100,000 times larger)", which in turn implies a lower bound in effective application of extremely low weight merging.
https://arxiv.org/abs/2402.17762

maldv

Aug 4

PEFT would enhance this effect I would think.

In this post

grimjim Jim Lai
maldv Praxis Maldevide