dice-research
/

lola_v1

Text Generation

Model card Files Files and versions Community

neo-nlp-dev commited on Sep 18

Commit

932e28f

•

1 Parent(s): d7ea010

Update README.md

Files changed (1) hide show

README.md +27 -8

README.md CHANGED Viewed

@@ -1,23 +1,28 @@
----
-library_name: transformers
-tags: []
----
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
-**LOLA**: Large and Open Source Multilingual Language Model
 ## Model Description
 - **Developed by:** DICE Research Group (https://dice-research.org/) @ Paderborn University (https://www.uni-paderborn.de/)
-- **Model type:** GPT2 style (decoder-only) with alternating Mixture-of-Experts layers
 - **Number of Experts**: 16
-- **Model Size**: 1.3 Billion Dense / 7.4 Billion Sparse
 - **Language(s) (NLP):** 160+
-- **License:** Coming soon
 - **Repository:** https://github.com/dice-group/LOLA
 ## How to Get Started with the Model
 This pre-trained (causal language modeling) model can only be used for text-generation and requires further fine-tuning on downstream tasks.
@@ -62,3 +67,17 @@ To use the top-k sampling, please set `do_sample` to `True`.
 - Training steps: 296000
 - Tokens consumed: 465 Billion
 - Training time: ~19 days

+---
+library_name: transformers
+license: cc-by-4.0
+datasets:
+- uonlp/CulturaX
+---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
+# LOLA &mdash; An Open-Source Massively Multilingual Large Language Model
 ## Model Description
 - **Developed by:** DICE Research Group (https://dice-research.org/) @ Paderborn University (https://www.uni-paderborn.de/)
+- **Model type:** GPT2 style (decoder-only) with alternating sparse Mixture-of-Experts layers
 - **Number of Experts**: 16
+- **Model Size**: 1.3 Billion (active) / 7.4 Billion (total) *
 - **Language(s) (NLP):** 160+
+- **License:** CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
 - **Repository:** https://github.com/dice-group/LOLA
+<sub>* The number of parameters a model utilizes per token (ref: [Du et al, 2022](https://arxiv.org/abs/2112.06905)). This distinction is crucial for understanding the efficiency and performance of MoE models.</sub>
 ## How to Get Started with the Model
 This pre-trained (causal language modeling) model can only be used for text-generation and requires further fine-tuning on downstream tasks.
 - Training steps: 296000
 - Tokens consumed: 465 Billion
 - Training time: ~19 days
+## Citation
+If you use our work in your research, please make sure to cite it:
+```bibtex
+@misc{srivastava2024lolaopensourcemassively,
+      title={LOLA -- An Open-Source Massively Multilingual Large Language Model},
+      author={Nikit Srivastava and Denis Kuchelev and Tatiana Moteu Ngoli and Kshitij Shetty and Michael Roeder and Diego Moussallem and Hamada Zahera and Axel-Cyrille Ngonga Ngomo},
+      year={2024},
+      eprint={2409.11272},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2409.11272},
+}
+```