Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,13 @@
|
|
3 |
|
4 |
# Model Description
|
5 |
|
6 |
-
This model
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
|
9 |
|
@@ -12,10 +18,11 @@ This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Ma
|
|
12 |
| AraT5-Base | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |248GB 29B tokens (MSA + Tweets) |
|
13 |
| AraT5-Base-MSA | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |70GB (MSA) |
|
14 |
| AraT5-Base-Tweets| 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |178GB (Tweets) |
|
|
|
15 |
| mT5-Base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
|
16 |
-
| ArabicT5-Base
|
17 |
-
| ArabicT5-Large
|
18 |
-
| ArabicT5-xLarge | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (
|
19 |
|
20 |
|
21 |
## Results on TyDi QA, HARD, Sentiment Analysis, Sarcasm Detection ( Best Score is highlighted in bold )
|
@@ -26,6 +33,7 @@ This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Ma
|
|
26 |
| AraT5-Base-MSA | <center>70.90/84.00 |<center>**96.52**|<center>70.03/72.73|<center>60.69|<center>27.36|
|
27 |
| AraT5-Base-Tweets | <center>65.14/79.00 |<center>96.26|<center>70.67/73.52|<center>61.11|<center>25.08|
|
28 |
| mT5-Base | <center>72.20/84.13 |<center>96.24|<center>67.33/68.78|<center>52.18|<center>25.68|
|
|
|
29 |
| ArabicT5-Base | <center>70.79/84.76 |<center>96.36|<center>68.93/71.20|<center>58.93|<center>29.19|
|
30 |
| ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|<center>30.30|
|
31 |
| ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|<center>**31.70**|
|
@@ -41,9 +49,9 @@ In our XL-Sum results, although we show that AraT5-Base exceeded our ArabicT5-La
|
|
41 |
# Speedup Results
|
42 |
|
43 |
|
44 |
-
Below are our speedup results on the TyDi QA dataset where all models
|
45 |
|
46 |
-
Please note these results when we fixed our hyperparameters for all models.
|
47 |
|
48 |
|
49 |
| <center>Model | <center>Run Time (hh:mm:ss) | <center>Results on TyDi QA |
|
@@ -52,6 +60,7 @@ Please note these results when we fixed our hyperparameters for all models. To g
|
|
52 |
| AraT5-Base | <center>00:20:53 |<center>68.40/81.97|
|
53 |
| AraT5-Base-Tweets | <center>00:21:17 |<center>61.67/75.96|
|
54 |
| mT5-Base | <center>00:28:24 |<center>57.98/72.81|
|
|
|
55 |
| ArabicT5-Base | <center>00:20:00 |<center>70.79/83.85|
|
56 |
| ArabicT5-Large | <center>00:23:50 |<center>71.22/84.42|
|
57 |
| ArabicT5-xLarge | <center>00:52:17 |<center>72.86/86.00|
|
|
|
3 |
|
4 |
# Model Description
|
5 |
|
6 |
+
This model adapts T5 on the Arabic Language by pre-training T5 on :
|
7 |
+
- Arabic Wikipedia.
|
8 |
+
- Marefa encyclopedia.
|
9 |
+
- Hindawi Books.
|
10 |
+
- a collection of Arabic News.
|
11 |
+
|
12 |
+
Total Corpora size is 17GB. We restrict our corpora to News and Encyclopedias to enhance the performance of the model on informative tasks such as Factoid Question Answering and Generative task that uses classic Arabic ( الفصحى ). This also gives our models an advantage if you don't want the generative text to contain inappropriate language. This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686) .
|
13 |
|
14 |
## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
|
15 |
|
|
|
18 |
| AraT5-Base | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |248GB 29B tokens (MSA + Tweets) |
|
19 |
| AraT5-Base-MSA | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |70GB (MSA) |
|
20 |
| AraT5-Base-Tweets| 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |178GB (Tweets) |
|
21 |
+
| AraBART-Base | 768 | 12 | 12 | 50K | 128 V100 GPUs (60h) |25 epochs| - | - |73GB (MSA) |
|
22 |
| mT5-Base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
|
23 |
+
| ArabicT5-Base | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (Wiki + Hindawi + Marefa + News) |
|
24 |
+
| ArabicT5-Large | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (Wiki + Hindawi + Marefa + News) |
|
25 |
+
| ArabicT5-xLarge | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (Wiki + Hindawi + Marefa + News) |
|
26 |
|
27 |
|
28 |
## Results on TyDi QA, HARD, Sentiment Analysis, Sarcasm Detection ( Best Score is highlighted in bold )
|
|
|
33 |
| AraT5-Base-MSA | <center>70.90/84.00 |<center>**96.52**|<center>70.03/72.73|<center>60.69|<center>27.36|
|
34 |
| AraT5-Base-Tweets | <center>65.14/79.00 |<center>96.26|<center>70.67/73.52|<center>61.11|<center>25.08|
|
35 |
| mT5-Base | <center>72.20/84.13 |<center>96.24|<center>67.33/68.78|<center>52.18|<center>25.68|
|
36 |
+
| AraBART-Base | <center>48.75/71.15 |<center>96.11|<center>66.23/68.18|<center>56.30|<center>31.20|
|
37 |
| ArabicT5-Base | <center>70.79/84.76 |<center>96.36|<center>68.93/71.20|<center>58.93|<center>29.19|
|
38 |
| ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|<center>30.30|
|
39 |
| ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|<center>**31.70**|
|
|
|
49 |
# Speedup Results
|
50 |
|
51 |
|
52 |
+
Below are our speedup results on the TyDi QA dataset, where all models have fine-tuned 13 epochs with a learning rate of 2e-4 and batch size of 3 on each device on the TPU (TPU3v-8 batch=3x8->24).
|
53 |
|
54 |
+
Please note these results when we fixed our hyperparameters for all models. Refer to the table above to get the best results after doing a grid search.
|
55 |
|
56 |
|
57 |
| <center>Model | <center>Run Time (hh:mm:ss) | <center>Results on TyDi QA |
|
|
|
60 |
| AraT5-Base | <center>00:20:53 |<center>68.40/81.97|
|
61 |
| AraT5-Base-Tweets | <center>00:21:17 |<center>61.67/75.96|
|
62 |
| mT5-Base | <center>00:28:24 |<center>57.98/72.81|
|
63 |
+
| AraBART-Base | <center>00:10:57 |<center>43.76/66.30|
|
64 |
| ArabicT5-Base | <center>00:20:00 |<center>70.79/83.85|
|
65 |
| ArabicT5-Large | <center>00:23:50 |<center>71.22/84.42|
|
66 |
| ArabicT5-xLarge | <center>00:52:17 |<center>72.86/86.00|
|