neo-nlp-dev
commited on
Commit
•
932e28f
1
Parent(s):
d7ea010
Update README.md
Browse files
README.md
CHANGED
@@ -1,23 +1,28 @@
|
|
1 |
-
---
|
2 |
-
library_name: transformers
|
3 |
-
|
4 |
-
|
|
|
|
|
5 |
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
|
|
10 |
|
11 |
## Model Description
|
12 |
|
13 |
- **Developed by:** DICE Research Group (https://dice-research.org/) @ Paderborn University (https://www.uni-paderborn.de/)
|
14 |
-
- **Model type:** GPT2 style (decoder-only) with alternating Mixture-of-Experts layers
|
15 |
- **Number of Experts**: 16
|
16 |
-
- **Model Size**: 1.3 Billion
|
17 |
- **Language(s) (NLP):** 160+
|
18 |
-
- **License:**
|
19 |
- **Repository:** https://github.com/dice-group/LOLA
|
20 |
|
|
|
|
|
21 |
## How to Get Started with the Model
|
22 |
|
23 |
This pre-trained (causal language modeling) model can only be used for text-generation and requires further fine-tuning on downstream tasks.
|
@@ -62,3 +67,17 @@ To use the top-k sampling, please set `do_sample` to `True`.
|
|
62 |
- Training steps: 296000
|
63 |
- Tokens consumed: 465 Billion
|
64 |
- Training time: ~19 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
library_name: transformers
|
3 |
+
license: cc-by-4.0
|
4 |
+
datasets:
|
5 |
+
- uonlp/CulturaX
|
6 |
+
---
|
7 |
|
8 |
# Model Card for Model ID
|
9 |
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
+
# LOLA — An Open-Source Massively Multilingual Large Language Model
|
12 |
+
|
13 |
|
14 |
## Model Description
|
15 |
|
16 |
- **Developed by:** DICE Research Group (https://dice-research.org/) @ Paderborn University (https://www.uni-paderborn.de/)
|
17 |
+
- **Model type:** GPT2 style (decoder-only) with alternating sparse Mixture-of-Experts layers
|
18 |
- **Number of Experts**: 16
|
19 |
+
- **Model Size**: 1.3 Billion (active) / 7.4 Billion (total) *
|
20 |
- **Language(s) (NLP):** 160+
|
21 |
+
- **License:** CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
|
22 |
- **Repository:** https://github.com/dice-group/LOLA
|
23 |
|
24 |
+
<sub>* The number of parameters a model utilizes per token (ref: [Du et al, 2022](https://arxiv.org/abs/2112.06905)). This distinction is crucial for understanding the efficiency and performance of MoE models.</sub>
|
25 |
+
|
26 |
## How to Get Started with the Model
|
27 |
|
28 |
This pre-trained (causal language modeling) model can only be used for text-generation and requires further fine-tuning on downstream tasks.
|
|
|
67 |
- Training steps: 296000
|
68 |
- Tokens consumed: 465 Billion
|
69 |
- Training time: ~19 days
|
70 |
+
|
71 |
+
## Citation
|
72 |
+
If you use our work in your research, please make sure to cite it:
|
73 |
+
```bibtex
|
74 |
+
@misc{srivastava2024lolaopensourcemassively,
|
75 |
+
title={LOLA -- An Open-Source Massively Multilingual Large Language Model},
|
76 |
+
author={Nikit Srivastava and Denis Kuchelev and Tatiana Moteu Ngoli and Kshitij Shetty and Michael Roeder and Diego Moussallem and Hamada Zahera and Axel-Cyrille Ngonga Ngomo},
|
77 |
+
year={2024},
|
78 |
+
eprint={2409.11272},
|
79 |
+
archivePrefix={arXiv},
|
80 |
+
primaryClass={cs.CL},
|
81 |
+
url={https://arxiv.org/abs/2409.11272},
|
82 |
+
}
|
83 |
+
```
|