sultan commited on
Commit
0272800
1 Parent(s): b2fc714

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -32
README.md CHANGED
@@ -9,15 +9,9 @@ This model adapts T5 on the Arabic Language by pre-training T5 on :
9
  - Hindawi Books.
10
  - a collection of Arabic News.
11
 
12
- Total Corpora size is 17GB. We restrict our corpora to News and Encyclopedias to enhance the performance of the model on informative tasks such as Factoid Question Answering and Generative task that uses classic Arabic ( الفصحى ). This also gives our models an advantage if you don't want the generative text to contain inappropriate language. This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686) .
13
 
14
- ```diff
15
- - We changed the name of our model to match the original paper's naming (https://arxiv.org/abs/2109.10686) refer to page 8, Table 4.
16
 
17
- ArabicT5-Base --> ArabicT5-17GB-small
18
- ArabicT5-Large --> ArabicT5-17GB-base
19
- ArabicT5-xLarge --> ArabicT5-17GB-large
20
- ```
21
  ## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
22
 
23
  | Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora |
@@ -28,7 +22,9 @@ ArabicT5-xLarge --> ArabicT5-17GB-large
28
  | AraBART-base | 768 | 12 | 12 | 50K | 128 V100 GPUs (60h) |25 epochs| - | - |73GB (MSA) |
29
  | mT5-base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
30
  | ArabicT5-17GB-small | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (MSA) |
 
31
  | ArabicT5-17GB-base | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
 
32
  | ArabicT5-17GB-large | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
33
 
34
 
@@ -42,7 +38,9 @@ ArabicT5-xLarge --> ArabicT5-17GB-large
42
  | mT5-base | <center>72.2/84.1 |<center>96.2|<center>67.3/68.8|<center>52.2|<center>25.7|
43
  | AraBART-base | <center>48.8/71.2 |<center>96.1|<center>66.2/68.2|<center>56.3|<center>31.2|
44
  | ArabicT5-17GB-small | <center>70.8/84.8 |<center>96.4|<center>68.9/71.2|<center>58.9|<center>29.2|
 
45
  | ArabicT5-17GB-base | <center>73.3/86.1 |<center>96.4|<center>70.4/73.0|<center>59.8|<center>30.3|
 
46
  | ArabicT5-17GB-large | <center>**75.5/87.1** |<center>**96.5**| <center>**72.2/75.2**|<center>**61.7**|<center>**31.7**|
47
 
48
  Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic), XL-SUM (Rouge-L with Stemmer).
@@ -51,33 +49,12 @@ You can download the full details of our grid search for all models in all tasks
51
 
52
  For the XL-Sum task, we choose our best run for each model using the eval set. We use the official evaluation script from XL-Sum, which uses the stemmer function, which may show better results than papers that don't use the stemmer function. The official XL-Sum paper uses a stemmer function.
53
 
54
- In our XL-Sum results, although we show that AraT5-Base exceeded our ArabicT5-Large, in most runs, our ArabicT5-Large shows better results, as you can see from our grid search file.
55
 
56
- # Speedup Results
 
 
57
 
58
 
59
- Below are our speedup results on the TyDi QA dataset, where all models have fine-tuned 13 epochs with a learning rate of 2e-4 and batch size of 3 on each device on the TPU (TPU3v-8 batch=3x8->24).
60
-
61
- Please note these results when we fixed our hyperparameters for all models. Refer to the table above to get the best results after doing a grid search.
62
-
63
-
64
- | <center>Model | <center>Run Time (hh:mm:ss) | <center>Results on TyDi QA |
65
- |----------------------|---------------|---------------------|
66
- | AraT5-msa-base | <center>00:20:41 |<center>69.92/82.50|
67
- | AraT5-base | <center>00:20:53 |<center>68.40/81.97|
68
- | AraT5-base-Tweets | <center>00:21:17 |<center>61.67/75.96|
69
- | mT5-base | <center>00:28:24 |<center>57.98/72.81|
70
- | AraBART-base | <center>00:10:57 |<center>43.76/66.30|
71
- | ArabicT5-17GB-small | <center>00:20:00 |<center>70.79/83.85|
72
- | ArabicT5-17GB-base | <center>00:23:50 |<center>71.22/84.42|
73
- | ArabicT5-17GB-large | <center>00:52:17 |<center>72.86/86.00|
74
-
75
-
76
- Please note that we can further speed up our ArabicT5-Base by increasing the batch size since it could handle larger batch size than other base-scale models due to its hidden layer size (512).
77
-
78
- # Paper
79
-
80
- [Generative Approach for Gender-Rewriting Task with ArabicT5](https://aclanthology.org/2022.wanlp-1.55/)
81
 
82
  # FineTuning our ArabicT5 model on generative and abstractive tasks with FLAX ###
83
 
@@ -92,7 +69,12 @@ https://github.com/salrowili/ArabicT5
92
 
93
  # Acknowledgment
94
 
95
- We would like to acknowledge the support we have from The TPU Research Cloud (TRC) team to grant us access to TPUv3 units.
 
 
 
 
 
96
 
97
  # Citation
98
 
 
9
  - Hindawi Books.
10
  - a collection of Arabic News.
11
 
12
+ Total Corpora size is 17GB. This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686) and uses T5x for pre-training [Link](https://github.com/google-research/t5x)
13
 
 
 
14
 
 
 
 
 
15
  ## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
16
 
17
  | Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora |
 
22
  | AraBART-base | 768 | 12 | 12 | 50K | 128 V100 GPUs (60h) |25 epochs| - | - |73GB (MSA) |
23
  | mT5-base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
24
  | ArabicT5-17GB-small | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (MSA) |
25
+ | ArabicT5-49GB-small | 512 | 8 | 16 | 32K |TPUv3-64 | 500K | 256 | 1.0x |49GB (MSA + OSCAR) |
26
  | ArabicT5-17GB-base | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
27
+ | ArabicT5-49GB-base | 768 | 12 | 16 | 32K |TPUv3-64 | 500K | 256 | 1.0x |49GB (MSA + OSCAR) |
28
  | ArabicT5-17GB-large | 768 | 12 | 36 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
29
 
30
 
 
38
  | mT5-base | <center>72.2/84.1 |<center>96.2|<center>67.3/68.8|<center>52.2|<center>25.7|
39
  | AraBART-base | <center>48.8/71.2 |<center>96.1|<center>66.2/68.2|<center>56.3|<center>31.2|
40
  | ArabicT5-17GB-small | <center>70.8/84.8 |<center>96.4|<center>68.9/71.2|<center>58.9|<center>29.2|
41
+ | ArabicT5-49GB-small | <center>72.4/85.1 |<center>96.4|<center>70.2/73.4|<center>61.0|<center>30.2|
42
  | ArabicT5-17GB-base | <center>73.3/86.1 |<center>96.4|<center>70.4/73.0|<center>59.8|<center>30.3|
43
+ | ArabicT5-49GB-base | <center>72.1/85.1 |<center>96.5|<center>71.3/74.1|<center>60.4|<center>30.9|
44
  | ArabicT5-17GB-large | <center>**75.5/87.1** |<center>**96.5**| <center>**72.2/75.2**|<center>**61.7**|<center>**31.7**|
45
 
46
  Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic), XL-SUM (Rouge-L with Stemmer).
 
49
 
50
  For the XL-Sum task, we choose our best run for each model using the eval set. We use the official evaluation script from XL-Sum, which uses the stemmer function, which may show better results than papers that don't use the stemmer function. The official XL-Sum paper uses a stemmer function.
51
 
 
52
 
53
+ # Continual Pre-Training of ArabicT5 with T5x
54
+ if you want to continue pre-training ArabicT5 on your own data, we have uploaded the raw t5x checkpoint to this link https://huggingface.co/sultan/ArabicT5-49GB-base/blob/main/arabict5_49GB_base_t5x.tar.gz
55
+ We will soon share a tutorial on how you can do that for free with Kaggle TPU
56
 
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  # FineTuning our ArabicT5 model on generative and abstractive tasks with FLAX ###
60
 
 
69
 
70
  # Acknowledgment
71
 
72
+ We want to acknowledge the support we have from The TPU Research Cloud (TRC) team to grant us access to TPUv3 units.
73
+
74
+
75
+ # Paper
76
+
77
+ [Generative Approach for Gender-Rewriting Task with ArabicT5](https://aclanthology.org/2022.wanlp-1.55/)
78
 
79
  # Citation
80