Update README.md
Browse files
README.md
CHANGED
@@ -9,15 +9,9 @@ This model adapts T5 on the Arabic Language by pre-training T5 on :
|
|
9 |
- Hindawi Books.
|
10 |
- a collection of Arabic News.
|
11 |
|
12 |
-
Total Corpora size is 17GB.
|
13 |
|
14 |
-
```diff
|
15 |
-
- We changed the name of our model to match the original paper's naming (https://arxiv.org/abs/2109.10686) refer to page 8, Table 4.
|
16 |
|
17 |
-
ArabicT5-Base --> ArabicT5-17GB-small
|
18 |
-
ArabicT5-Large --> ArabicT5-17GB-base
|
19 |
-
ArabicT5-xLarge --> ArabicT5-17GB-large
|
20 |
-
```
|
21 |
## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
|
22 |
|
23 |
| Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora |
|
@@ -28,7 +22,9 @@ ArabicT5-xLarge --> ArabicT5-17GB-large
|
|
28 |
| AraBART-base | 768 | 12 | 12 | 50K | 128 V100 GPUs (60h) |25 epochs| - | - |73GB (MSA) |
|
29 |
| mT5-base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
|
30 |
| ArabicT5-17GB-small | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (MSA) |
|
|
|
31 |
| ArabicT5-17GB-base | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
|
|
|
32 |
| ArabicT5-17GB-large | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
|
33 |
|
34 |
|
@@ -42,7 +38,9 @@ ArabicT5-xLarge --> ArabicT5-17GB-large
|
|
42 |
| mT5-base | <center>72.2/84.1 |<center>96.2|<center>67.3/68.8|<center>52.2|<center>25.7|
|
43 |
| AraBART-base | <center>48.8/71.2 |<center>96.1|<center>66.2/68.2|<center>56.3|<center>31.2|
|
44 |
| ArabicT5-17GB-small | <center>70.8/84.8 |<center>96.4|<center>68.9/71.2|<center>58.9|<center>29.2|
|
|
|
45 |
| ArabicT5-17GB-base | <center>73.3/86.1 |<center>96.4|<center>70.4/73.0|<center>59.8|<center>30.3|
|
|
|
46 |
| ArabicT5-17GB-large | <center>**75.5/87.1** |<center>**96.5**| <center>**72.2/75.2**|<center>**61.7**|<center>**31.7**|
|
47 |
|
48 |
Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic), XL-SUM (Rouge-L with Stemmer).
|
@@ -51,33 +49,12 @@ You can download the full details of our grid search for all models in all tasks
|
|
51 |
|
52 |
For the XL-Sum task, we choose our best run for each model using the eval set. We use the official evaluation script from XL-Sum, which uses the stemmer function, which may show better results than papers that don't use the stemmer function. The official XL-Sum paper uses a stemmer function.
|
53 |
|
54 |
-
In our XL-Sum results, although we show that AraT5-Base exceeded our ArabicT5-Large, in most runs, our ArabicT5-Large shows better results, as you can see from our grid search file.
|
55 |
|
56 |
-
#
|
|
|
|
|
57 |
|
58 |
|
59 |
-
Below are our speedup results on the TyDi QA dataset, where all models have fine-tuned 13 epochs with a learning rate of 2e-4 and batch size of 3 on each device on the TPU (TPU3v-8 batch=3x8->24).
|
60 |
-
|
61 |
-
Please note these results when we fixed our hyperparameters for all models. Refer to the table above to get the best results after doing a grid search.
|
62 |
-
|
63 |
-
|
64 |
-
| <center>Model | <center>Run Time (hh:mm:ss) | <center>Results on TyDi QA |
|
65 |
-
|----------------------|---------------|---------------------|
|
66 |
-
| AraT5-msa-base | <center>00:20:41 |<center>69.92/82.50|
|
67 |
-
| AraT5-base | <center>00:20:53 |<center>68.40/81.97|
|
68 |
-
| AraT5-base-Tweets | <center>00:21:17 |<center>61.67/75.96|
|
69 |
-
| mT5-base | <center>00:28:24 |<center>57.98/72.81|
|
70 |
-
| AraBART-base | <center>00:10:57 |<center>43.76/66.30|
|
71 |
-
| ArabicT5-17GB-small | <center>00:20:00 |<center>70.79/83.85|
|
72 |
-
| ArabicT5-17GB-base | <center>00:23:50 |<center>71.22/84.42|
|
73 |
-
| ArabicT5-17GB-large | <center>00:52:17 |<center>72.86/86.00|
|
74 |
-
|
75 |
-
|
76 |
-
Please note that we can further speed up our ArabicT5-Base by increasing the batch size since it could handle larger batch size than other base-scale models due to its hidden layer size (512).
|
77 |
-
|
78 |
-
# Paper
|
79 |
-
|
80 |
-
[Generative Approach for Gender-Rewriting Task with ArabicT5](https://aclanthology.org/2022.wanlp-1.55/)
|
81 |
|
82 |
# FineTuning our ArabicT5 model on generative and abstractive tasks with FLAX ###
|
83 |
|
@@ -92,7 +69,12 @@ https://github.com/salrowili/ArabicT5
|
|
92 |
|
93 |
# Acknowledgment
|
94 |
|
95 |
-
We
|
|
|
|
|
|
|
|
|
|
|
96 |
|
97 |
# Citation
|
98 |
|
|
|
9 |
- Hindawi Books.
|
10 |
- a collection of Arabic News.
|
11 |
|
12 |
+
Total Corpora size is 17GB. This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686) and uses T5x for pre-training [Link](https://github.com/google-research/t5x)
|
13 |
|
|
|
|
|
14 |
|
|
|
|
|
|
|
|
|
15 |
## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
|
16 |
|
17 |
| Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora |
|
|
|
22 |
| AraBART-base | 768 | 12 | 12 | 50K | 128 V100 GPUs (60h) |25 epochs| - | - |73GB (MSA) |
|
23 |
| mT5-base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
|
24 |
| ArabicT5-17GB-small | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (MSA) |
|
25 |
+
| ArabicT5-49GB-small | 512 | 8 | 16 | 32K |TPUv3-64 | 500K | 256 | 1.0x |49GB (MSA + OSCAR) |
|
26 |
| ArabicT5-17GB-base | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
|
27 |
+
| ArabicT5-49GB-base | 768 | 12 | 16 | 32K |TPUv3-64 | 500K | 256 | 1.0x |49GB (MSA + OSCAR) |
|
28 |
| ArabicT5-17GB-large | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
|
29 |
|
30 |
|
|
|
38 |
| mT5-base | <center>72.2/84.1 |<center>96.2|<center>67.3/68.8|<center>52.2|<center>25.7|
|
39 |
| AraBART-base | <center>48.8/71.2 |<center>96.1|<center>66.2/68.2|<center>56.3|<center>31.2|
|
40 |
| ArabicT5-17GB-small | <center>70.8/84.8 |<center>96.4|<center>68.9/71.2|<center>58.9|<center>29.2|
|
41 |
+
| ArabicT5-49GB-small | <center>72.4/85.1 |<center>96.4|<center>70.2/73.4|<center>61.0|<center>30.2|
|
42 |
| ArabicT5-17GB-base | <center>73.3/86.1 |<center>96.4|<center>70.4/73.0|<center>59.8|<center>30.3|
|
43 |
+
| ArabicT5-49GB-base | <center>72.1/85.1 |<center>96.5|<center>71.3/74.1|<center>60.4|<center>30.9|
|
44 |
| ArabicT5-17GB-large | <center>**75.5/87.1** |<center>**96.5**| <center>**72.2/75.2**|<center>**61.7**|<center>**31.7**|
|
45 |
|
46 |
Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic), XL-SUM (Rouge-L with Stemmer).
|
|
|
49 |
|
50 |
For the XL-Sum task, we choose our best run for each model using the eval set. We use the official evaluation script from XL-Sum, which uses the stemmer function, which may show better results than papers that don't use the stemmer function. The official XL-Sum paper uses a stemmer function.
|
51 |
|
|
|
52 |
|
53 |
+
# Continual Pre-Training of ArabicT5 with T5x
|
54 |
+
if you want to continue pre-training ArabicT5 on your own data, we have uploaded the raw t5x checkpoint to this link https://huggingface.co/sultan/ArabicT5-49GB-base/blob/main/arabict5_49GB_base_t5x.tar.gz
|
55 |
+
We will soon share a tutorial on how you can do that for free with Kaggle TPU
|
56 |
|
57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
# FineTuning our ArabicT5 model on generative and abstractive tasks with FLAX ###
|
60 |
|
|
|
69 |
|
70 |
# Acknowledgment
|
71 |
|
72 |
+
We want to acknowledge the support we have from The TPU Research Cloud (TRC) team to grant us access to TPUv3 units.
|
73 |
+
|
74 |
+
|
75 |
+
# Paper
|
76 |
+
|
77 |
+
[Generative Approach for Gender-Rewriting Task with ArabicT5](https://aclanthology.org/2022.wanlp-1.55/)
|
78 |
|
79 |
# Citation
|
80 |
|