Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for educational and/or merge purposes.

Organizations

Posts 14

view post
Post
1837
To demonstrate that it was possible, I performed a "trapezoid" gradient merge of a Llama 3 8B model onto Llama 3.1 8B Instruct, favoring the L3.1 model at the ends in order to preserve coherence and limiting the influence of the L3 model to at most 0.1 weight. Tested to 16k context length.
grimjim/Llama-Nephilim-Metamorphosis-v2-8B
view post
Post
1815
I was reading through an abstract and found myself wondering how much LLM performance is being left on the table due to insufficient curation of training datasets: "Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning" by Kaur, Park, Goyal, Arora.
https://arxiv.org/abs/2408.14774
In particular, the observation that "Introducing low quality answers ("shirkers") in 20% of Instruct-SkillMix examples causes performance to plummet..." had me wondering how many ostensibly good datasets out there are in fact populated with a significant number of "shirkers".