@grimjim on Hugging Face: "To demonstrate that it was possible, I performed a "trapezoid" gradient merge…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

grimjim

posted an update Oct 14

Post

1950

To demonstrate that it was possible, I performed a "trapezoid" gradient merge of a Llama 3 8B model onto Llama 3.1 8B Instruct, favoring the L3.1 model at the ends in order to preserve coherence and limiting the influence of the L3 model to at most 0.1 weight. Tested to 16k context length.
grimjim/Llama-Nephilim-Metamorphosis-v2-8B

In this post

grimjim Jim Lai