Post
12780
π AutoMerger created the best 7B model on the Open LLM Leaderboard
By randomly combining top models from the Open LLM Leaderboard, AutoMerger created YamshadowExperiment28-7B. The model is three weeks old and has been at the top of the leaderboard for a week now. It was created through a simple SLERP merge of:
- automerger/YamShadow-7B (another top model created by AutoMerger)
- yam-peleg/Experiment28-7B (a top model from @yam-peleg )
1/ On the Open LLM Leaderboard, it managed to outperform the excellent M7-7b model, which has been the #1 7B model for a while now.
2/ On the YALL leaderboard, YamshadowExperiment28-7B is ranked as the 9th best-performing automerge (but note that the scores are very close to each other). Compared to others, it does not perform particularly well on AGIEval or Bigbench.
3/ Thanks to @sam-paech , I have scores on EQ-Bench, where it managed to outperform all of my previous models. It even surpasses recent models such as DBRX instruct, Qwen1.5 32B Chat, and Cohere's Command R+.
Surprisingly, it does not support ChatML or Mistral Instruct, unlike my other merges (which are part of its family tree). Alpaca works well 99% of the time, but the model can sometimes produce a lot of "INST" tokens for no reason.
In my experiments, YamshadowExperiment28-7B doesn't seem smarter than other successful merges like AlphaMonarch. On the contrary, I found several mathematical or reasoning problems where it fails.
Considering these results, it looks like it might overfit the Open LLM Leaderboard. I guess it's anything but surprising when you randomly merge 156 models.
π€ Model: automerger/YamshadowExperiment28-7B
π AutoMerger: mlabonne/AutoMerger
By randomly combining top models from the Open LLM Leaderboard, AutoMerger created YamshadowExperiment28-7B. The model is three weeks old and has been at the top of the leaderboard for a week now. It was created through a simple SLERP merge of:
- automerger/YamShadow-7B (another top model created by AutoMerger)
- yam-peleg/Experiment28-7B (a top model from @yam-peleg )
1/ On the Open LLM Leaderboard, it managed to outperform the excellent M7-7b model, which has been the #1 7B model for a while now.
2/ On the YALL leaderboard, YamshadowExperiment28-7B is ranked as the 9th best-performing automerge (but note that the scores are very close to each other). Compared to others, it does not perform particularly well on AGIEval or Bigbench.
3/ Thanks to @sam-paech , I have scores on EQ-Bench, where it managed to outperform all of my previous models. It even surpasses recent models such as DBRX instruct, Qwen1.5 32B Chat, and Cohere's Command R+.
Surprisingly, it does not support ChatML or Mistral Instruct, unlike my other merges (which are part of its family tree). Alpaca works well 99% of the time, but the model can sometimes produce a lot of "INST" tokens for no reason.
In my experiments, YamshadowExperiment28-7B doesn't seem smarter than other successful merges like AlphaMonarch. On the contrary, I found several mathematical or reasoning problems where it fails.
Considering these results, it looks like it might overfit the Open LLM Leaderboard. I guess it's anything but surprising when you randomly merge 156 models.
π€ Model: automerger/YamshadowExperiment28-7B
π AutoMerger: mlabonne/AutoMerger