Model Merging Papers
Collection of relevant papers about model merging
Paper • 1412.6544 • Published • 4Note Background read to understand loss landscapes in deep neural networks and why local minima are not a problem for SGD.
Averaging Weights Leads to Wider Optima and Better Generalization
Paper • 1803.05407 • Published • 2Merging Models with Fisher-Weighted Averaging
Paper • 2111.09832 • Published • 1
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Paper • 2203.05482 • Published • 6Note It's the classical merge method: a simple weighted average.
Editing Models with Task Arithmetic
Paper • 2212.04089 • Published • 6Note A new paradigm for modifying the behavior of neural networks using “task vectors.” that represent directions in the weights space of a pre-trained model, instead of magnitudes, and point towards improved performance on a specific task.
Dataless Knowledge Fusion by Merging Weights of Language Models
Paper • 2212.09849 • Published
Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 13Note The TIES-merging approach addresses the problem of interference between parameters from different models: redundant parameters and sign conflicts.
Early Weight Averaging meets High Learning Rates for LLM Pre-training
Paper • 2306.03241 • Published • 2Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Paper • 2308.07317 • Published • 23
Model Merging by Uncertainty-Based Gradient Matching
Paper • 2310.12808 • Published • 6Note Examines the theoretical properties and practical implications of weight averaging applied to larger generative models.
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper • 2311.03099 • Published • 28Note DARE is a novel approach to model merging. It uses a similar approach to TIES with 2 main differences: pruning and rescaling.
WARM: On the Benefits of Weight Averaged Reward Models
Paper • 2401.12187 • Published • 18Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 10