Crystalcareai
/

LlaMoE-Medium

Text Generation

Model card Files Files and versions Community

Crystalcareai commited on Apr 29

Commit

314568f

•

1 Parent(s): 550faaf

Update README.md

Files changed (1) hide show

README.md +12 -3

README.md CHANGED Viewed

@@ -1,3 +1,12 @@
-<p align="center">
-  <img src="https://huggingface.co/Crystalcareai/LlaMoE-Medium/resolve/main/resources/ddb-nye2T3C3vZwJJm1l6A.png" width="350" title="hover text">
-</p>

+<p align="center"> <img src="https://huggingface.co/Crystalcareai/LlaMoE-Medium/resolve/main/resources/ddb-nye2T3C3vZwJJm1l6A.png" width="350" title="LlaMoE-Medium model image"> </p>
+This is a 4x8b Llama Mixture of Experts (MoE) model. It was trained on the OpenHermes Resort dataset from the Dolphin-2.9 dataset.
+The model is a combination of 4 Llama fine-tunes, using DeepSpeed-MoE's architecture. All experts are active for every token.
+This is a VERY good model, somewhere in between 8B and Llama 70B in capability. Enjoy!
+Thank you to:
+    CrusoeEnergy for sponsoring the compute for this project
+    My collaborators Eric Hartford and Fernando (has too many names) Neto