Update README.md
Browse files
README.md
CHANGED
@@ -1,13 +1 @@
|
|
1 |
-
|
2 |
-
datasets:
|
3 |
-
- crumb/Wizard-EvolInstruct70k-k4
|
4 |
-
language:
|
5 |
-
- en
|
6 |
-
tags:
|
7 |
-
- switch_transformers
|
8 |
-
- llama
|
9 |
-
- MoE
|
10 |
-
---
|
11 |
-
This is the very first testing switchllama model from MoLora2, starting from OpenLlama-3b-v2 and adding 4 experts in the MLP blocks of the model. The experts were trained with QLora and merged properly (in 4bit) after individually training adapters on `gate_proj, up_proj, down_proj`. The 4 expert models were trained on clusters from [crumb/Wizard-EvolInstruct70k-k4](https://huggingface.co/datasets/crumb/Wizard-EvolInstruct70k-k4) then their trained MLP weights were taken and transplanted in a model initialized from OpenLlama-3b with 4 switchtransformer experts. The routers are not trained in this version of the model and are randomly initialized.
|
12 |
-
|
13 |
-
Modeling code is not included until this proof-of-concept is entirely trained.
|
|
|
1 |
+
this is the initial model warmstarted from openllama 3b v2 with the expert layers copied from models trained with lora on only the mlp layers, if you can find the problem with the modeling code i will send you a picture of my cat
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|