Text Generation
Transformers
Safetensors
lola_v1
custom_code
neo-nlp-dev commited on
Commit
932e28f
1 Parent(s): d7ea010

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -8
README.md CHANGED
@@ -1,23 +1,28 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
 
 
5
 
6
  # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
- **LOLA**: Large and Open Source Multilingual Language Model
 
10
 
11
  ## Model Description
12
 
13
  - **Developed by:** DICE Research Group (https://dice-research.org/) @ Paderborn University (https://www.uni-paderborn.de/)
14
- - **Model type:** GPT2 style (decoder-only) with alternating Mixture-of-Experts layers
15
  - **Number of Experts**: 16
16
- - **Model Size**: 1.3 Billion Dense / 7.4 Billion Sparse
17
  - **Language(s) (NLP):** 160+
18
- - **License:** Coming soon
19
  - **Repository:** https://github.com/dice-group/LOLA
20
 
 
 
21
  ## How to Get Started with the Model
22
 
23
  This pre-trained (causal language modeling) model can only be used for text-generation and requires further fine-tuning on downstream tasks.
@@ -62,3 +67,17 @@ To use the top-k sampling, please set `do_sample` to `True`.
62
  - Training steps: 296000
63
  - Tokens consumed: 465 Billion
64
  - Training time: ~19 days
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: cc-by-4.0
4
+ datasets:
5
+ - uonlp/CulturaX
6
+ ---
7
 
8
  # Model Card for Model ID
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
+ # LOLA &mdash; An Open-Source Massively Multilingual Large Language Model
12
+
13
 
14
  ## Model Description
15
 
16
  - **Developed by:** DICE Research Group (https://dice-research.org/) @ Paderborn University (https://www.uni-paderborn.de/)
17
+ - **Model type:** GPT2 style (decoder-only) with alternating sparse Mixture-of-Experts layers
18
  - **Number of Experts**: 16
19
+ - **Model Size**: 1.3 Billion (active) / 7.4 Billion (total) *
20
  - **Language(s) (NLP):** 160+
21
+ - **License:** CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
22
  - **Repository:** https://github.com/dice-group/LOLA
23
 
24
+ <sub>* The number of parameters a model utilizes per token (ref: [Du et al, 2022](https://arxiv.org/abs/2112.06905)). This distinction is crucial for understanding the efficiency and performance of MoE models.</sub>
25
+
26
  ## How to Get Started with the Model
27
 
28
  This pre-trained (causal language modeling) model can only be used for text-generation and requires further fine-tuning on downstream tasks.
 
67
  - Training steps: 296000
68
  - Tokens consumed: 465 Billion
69
  - Training time: ~19 days
70
+
71
+ ## Citation
72
+ If you use our work in your research, please make sure to cite it:
73
+ ```bibtex
74
+ @misc{srivastava2024lolaopensourcemassively,
75
+ title={LOLA -- An Open-Source Massively Multilingual Large Language Model},
76
+ author={Nikit Srivastava and Denis Kuchelev and Tatiana Moteu Ngoli and Kshitij Shetty and Michael Roeder and Diego Moussallem and Hamada Zahera and Axel-Cyrille Ngonga Ngomo},
77
+ year={2024},
78
+ eprint={2409.11272},
79
+ archivePrefix={arXiv},
80
+ primaryClass={cs.CL},
81
+ url={https://arxiv.org/abs/2409.11272},
82
+ }
83
+ ```