abdouaziiz
commited on
Commit
•
514daa2
1
Parent(s):
8cb26da
Upload README.md
Browse files
README.md
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: fr
|
3 |
+
license: mit
|
4 |
+
tags:
|
5 |
+
- roberta
|
6 |
+
- language-model
|
7 |
+
- wo
|
8 |
+
- wolof
|
9 |
+
- french
|
10 |
+
---
|
11 |
+
|
12 |
+
# Soraberta: Unsupervised Language Model Pre-training for Wolof
|
13 |
+
|
14 |
+
**Soraberta** is pretrained roberta-base model on wolof language . Roberta was introduced in [this paper](https://arxiv.org/abs/1907.11692)
|
15 |
+
|
16 |
+
## Soraberta models
|
17 |
+
|
18 |
+
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
|
19 |
+
| :------: | :---: | :---: | :---: | :---: |
|
20 |
+
| `soraberta-base` | 6 | 12 | 514 | 18 M |
|
21 |
+
|
22 |
+
|
23 |
+
|
24 |
+
|
25 |
+
## Using Soraberta with Hugging Face's Transformers
|
26 |
+
|
27 |
+
|
28 |
+
```python
|
29 |
+
>>> from transformers import pipeline
|
30 |
+
>>> unmasker = pipeline('fill-mask', model='abdouaziz/soraberta')
|
31 |
+
>>> unmasker("juroom naari jullit man nanoo boole jend aw nag walla <mask>.")
|
32 |
+
|
33 |
+
[{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla gileem.',
|
34 |
+
'score': 0.9783930778503418,
|
35 |
+
'token': 4621,
|
36 |
+
'token_str': ' gileem'},
|
37 |
+
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla jend.',
|
38 |
+
'score': 0.009271537885069847,
|
39 |
+
'token': 2155,
|
40 |
+
'token_str': ' jend'},
|
41 |
+
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla aw.',
|
42 |
+
'score': 0.0027585660573095083,
|
43 |
+
'token': 704,
|
44 |
+
'token_str': ' aw'},
|
45 |
+
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla pel.',
|
46 |
+
'score': 0.001120452769100666,
|
47 |
+
'token': 1171,
|
48 |
+
'token_str': ' pel'},
|
49 |
+
{'sequence': 'juroom naari jullit man nanoo boole jend aw nag walla juum.',
|
50 |
+
'score': 0.0005133090307936072,
|
51 |
+
'token': 5820,
|
52 |
+
'token_str': ' juum'}]
|
53 |
+
```
|
54 |
+
|
55 |
+
## Training data
|
56 |
+
The data sources are [Bible OT](http://biblewolof.com/) , [WOLOF-ONELINE](http://www.wolof-online.com/)
|
57 |
+
|
58 |
+
|
59 |
+
|
60 |
+
## Contact
|
61 |
+
|
62 |
+
Please contact [email protected] for any question, feedback or request.
|