Upcycling Experiments
Collection
Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)
•
6 items
•
Updated
This model is a fine-tuned version of gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B on the wiki_demo dataset.
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Base model
gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B