Base Model or Finetuned Version?
It´s not really clear from your description whether this is the extracted base model or whether you already did finetuning on top of it?
If the latter, which data + prompt format did you use?
I´d be interested in just the extracted base model without any additional finetuning.
@jphme
Ur good! It’s a fine tune, I will release the base model, along with the v.2 lora so anyone who would like to fine tune it with lora from either my check point or from scratch can. My wifi bandwidth can only go so far. And I havent slept since mixtral 22b dropped 😅 also the safetensors files are almost done uploading, I would say like 15 ish minutes.
@flozi00 yeah, all computation was done locally my rooms a bit toasty right now 😂. Here if my twitter I just followed you (from ur HF profile) that way we can DM https://twitter.com/mejia_petit . I would love to talk more about this! (Preferably tomorrow haven't slept since mixtral 22b dropped)
please drop quantize version
@jphme @flozi00 I have a untrained versions of each raw extracted experts as dense Mistral 22B models.
https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-0
The other experts are there also on my profile
There is also this one which is a linear merge of all experts into one model : https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-lerp
They all output gibberish for the most part, expert 2 seems to be the most coherent from my limited tests. Expert 0 has the lowest perplexity on wikitext but I wasn't able to generate coherent text with it.
I'll be sharing code and evals in the next hours
Would love to know the prompt format as well.
Thank you.
@Winmodel Working to get an AWQ quant of this, debugging a few errors.
@jphme @flozi00 I have a untrained versions of each raw extracted experts as dense Mistral 22B models.
https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-0
The other experts are there also on my profile
There is also this one which is a linear merge of all experts into one model : https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-lerp
They all output gibberish for the most part, expert 2 seems to be the most coherent from my limited tests. Expert 0 has the lowest perplexity on wikitext but I wasn't able to generate coherent text with it.
I'll be sharing code and evals in the next hours
Those were some of my findings as well. I also found expert 2 to be the only once to write consistent english words. Provided it was completely unrelated to what I said asked, it atleast was an attempt every other expert truly did become an expert in langauge. As shown here “https://huggingface.co/blog/moe#what-does-an-expert-learn”. Some would have finicy spacing and symboks, and somewhere just mangled nonsense.
Would love to know the prompt format as well.
Thank you.
Alpaca! V2 is almost done, and its also alpaca, but in multi turn raw format. (So same thing for you just more work prepping dataset for me)
Alpaca! V2 is almost done, and its also alpaca, but in multi turn raw format. (So same thing for you just more work prepping dataset for me)
Great! I look forward to V2 :)
Alpaca! V2 is almost done, and its also alpaca, but in multi turn raw format. (So same thing for you just more work prepping dataset for me)
Great! I look forward to V2 :)
Thank you!! V2 is essentially the test to see if using all experts equally is the best thing to do, or just using a single one, by increase the data size by 8x i will easily he able to verify the knowledge of the model. There are still other methods I have yet to try, so I’m not done I’m gonna keep going till I get a 22b that out preforms mistral 7b as expected, by a 22b model.
please drop quantize version
I am uploading some GGUF quants(with importance matrix) here: https://huggingface.co/qwp4w3hyb/Mistral-22B-v0.1-iMat-GGUF