did Mixtral start from Mistral or from-scratch?

#64
by DaehanKim - opened

Hi Minstral Team,

Thank you for sharing this excellent piece of models.

I just saw a paper from Upstage AI which uses minstral 7B weights to make 10B parameter model.
and they compared it to Mixture-of-Expert architecture. I'm not sure whether Mixtral(minstral 7Bx8) was trained from scratch or from pretrained weights like mistral 7B, since it's easily adoptable. In the blog post, it says "pre-trained on data extracted from the open Web".

Thanks in advance!

DaehanKim changed discussion title from did Mixtral start from Minstral or from-scratch? to did Mixtral start from Mistral or from-scratch?

"Exciting times with the new Mixtral model from @MistralAI
! It’s evident that they’ve fine-tuned the Mistral 7B model to an impressive 8x. The significant correlation between the weights of the two models is a testament to the successful reuse of models. This approach could empower the OSS community with its own robust MoE!"

https://twitter.com/tianle_cai/status/1734188749117153684

Sign up or log in to comment