Crystalcareai
commited on
Commit
•
314568f
1
Parent(s):
550faaf
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,12 @@
|
|
1 |
-
<p align="center">
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<p align="center"> <img src="https://huggingface.co/Crystalcareai/LlaMoE-Medium/resolve/main/resources/ddb-nye2T3C3vZwJJm1l6A.png" width="350" title="LlaMoE-Medium model image"> </p>
|
2 |
+
|
3 |
+
This is a 4x8b Llama Mixture of Experts (MoE) model. It was trained on the OpenHermes Resort dataset from the Dolphin-2.9 dataset.
|
4 |
+
|
5 |
+
The model is a combination of 4 Llama fine-tunes, using DeepSpeed-MoE's architecture. All experts are active for every token.
|
6 |
+
|
7 |
+
This is a VERY good model, somewhere in between 8B and Llama 70B in capability. Enjoy!
|
8 |
+
|
9 |
+
Thank you to:
|
10 |
+
|
11 |
+
CrusoeEnergy for sponsoring the compute for this project
|
12 |
+
My collaborators Eric Hartford and Fernando (has too many names) Neto
|