sultan
/

ArabicT5-17GB-large

Text2Text Generation

Transformers

PyTorch

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sultan commited on Oct 31, 2023

Commit

0272800

•

1 Parent(s): b2fc714

Update README.md

Browse files

Files changed (1) hide show

README.md +14 -32

README.md CHANGED Viewed

@@ -9,15 +9,9 @@ This model adapts T5 on the Arabic Language by pre-training T5 on :
 - Hindawi Books.
 - a collection of Arabic News.
-Total Corpora size is 17GB. We restrict our corpora to News and Encyclopedias to enhance the performance of the model on informative tasks such as Factoid Question Answering and Generative task that uses classic Arabic ( الفصحى ). This also gives our models an advantage if you don't want the generative text to contain inappropriate language. This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686) .
-```diff
-- We changed the name of our model to match the original paper's naming (https://arxiv.org/abs/2109.10686) refer to page 8, Table 4.
-ArabicT5-Base   --> ArabicT5-17GB-small
-ArabicT5-Large  --> ArabicT5-17GB-base
-ArabicT5-xLarge --> ArabicT5-17GB-large
-```
 ## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
 |     Model        | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware  |Training Steps | Batch  |  Train x Batch Factor |Corpora                 |
@@ -28,7 +22,9 @@ ArabicT5-xLarge --> ArabicT5-17GB-large
 | AraBART-base     |     768      |      12     |      12       |  50K | 128 V100 GPUs (60h)    |25 epochs|  -     | -                     |73GB (MSA)          |
 | mT5-base         |     768      |      12     |      12       |  250K |TPUv3-32   |        1M     |  1024  | 8.0x                  |6.3T tokens (mC4)|
 | ArabicT5-17GB-small   |     512      |      8     |      20      |  32K  |TPUv3-32   |       256K    |  256   | 0.5x                 |17GB (MSA)          |
 | ArabicT5-17GB-base    |     768      |      12     |      16       |  32K  |TPUv3-128  |       500K    |  512   | 2.0x                  |17GB (MSA)          |
 | ArabicT5-17GB-large  |     768      |      12     |      36       |  32K  |TPUv3-128  |       500K    |  512   | 2.0x                  |17GB (MSA)          |
@@ -42,7 +38,9 @@ ArabicT5-xLarge --> ArabicT5-17GB-large
 | mT5-base             |  <center>72.2/84.1  |<center>96.2|<center>67.3/68.8|<center>52.2|<center>25.7|
 | AraBART-base         |  <center>48.8/71.2  |<center>96.1|<center>66.2/68.2|<center>56.3|<center>31.2|
 | ArabicT5-17GB-small        |  <center>70.8/84.8  |<center>96.4|<center>68.9/71.2|<center>58.9|<center>29.2|
 | ArabicT5-17GB-base       |  <center>73.3/86.1  |<center>96.4|<center>70.4/73.0|<center>59.8|<center>30.3|
 | ArabicT5-17GB-large      |  <center>**75.5/87.1**  |<center>**96.5**| <center>**72.2/75.2**|<center>**61.7**|<center>**31.7**|
 Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic), XL-SUM (Rouge-L with Stemmer).
@@ -51,33 +49,12 @@ You can download the full details of our grid search for all models in all tasks
 For the XL-Sum task, we choose our best run for each model using the eval set. We use the official evaluation script from XL-Sum, which uses the stemmer function, which may show better results than papers that don't use the stemmer function. The official XL-Sum paper uses a stemmer function.
-In our XL-Sum results, although we show that AraT5-Base exceeded our ArabicT5-Large, in most runs, our ArabicT5-Large shows better results, as you can see from our grid search file.
-# Speedup Results
-Below are our speedup results on the TyDi QA dataset, where all models have fine-tuned 13 epochs with a learning rate of 2e-4 and batch size of 3 on each device on the TPU (TPU3v-8 batch=3x8->24).
-Please note these results when we fixed our hyperparameters for all models. Refer to the table above to get the best results after doing a grid search.
-|    <center>Model            | <center>Run Time (hh:mm:ss) | <center>Results on TyDi QA |
-|----------------------|---------------|---------------------|
-| AraT5-msa-base        |  <center>00:20:41  |<center>69.92/82.50|
-| AraT5-base           |  <center>00:20:53  |<center>68.40/81.97|
-| AraT5-base-Tweets    |  <center>00:21:17  |<center>61.67/75.96|
-| mT5-base             |  <center>00:28:24  |<center>57.98/72.81|
-| AraBART-base         |  <center>00:10:57  |<center>43.76/66.30|
-| ArabicT5-17GB-small        |  <center>00:20:00  |<center>70.79/83.85|
-| ArabicT5-17GB-base       |  <center>00:23:50  |<center>71.22/84.42|
-| ArabicT5-17GB-large      |  <center>00:52:17  |<center>72.86/86.00|
-Please note that we can further speed up our ArabicT5-Base by increasing the batch size since it could handle larger batch size than other base-scale models due to its hidden layer size (512).
-# Paper
-[Generative Approach for Gender-Rewriting Task with ArabicT5](https://aclanthology.org/2022.wanlp-1.55/)
 # FineTuning our ArabicT5 model on generative and abstractive tasks with FLAX ###
@@ -92,7 +69,12 @@ https://github.com/salrowili/ArabicT5
 # Acknowledgment
-We would like to acknowledge the support we have from The TPU Research Cloud (TRC) team to grant us access to TPUv3 units.
 # Citation

 - Hindawi Books.
 - a collection of Arabic News.
+Total Corpora size is 17GB. This model uses an efficient implementation of T5 which reduces the fine-tuning and memory used [Link](https://arxiv.org/abs/2109.10686) and uses T5x for pre-training [Link](https://github.com/google-research/t5x)
 ## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
 |     Model        | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware  |Training Steps | Batch  |  Train x Batch Factor |Corpora                 |
 | AraBART-base     |     768      |      12     |      12       |  50K | 128 V100 GPUs (60h)    |25 epochs|  -     | -                     |73GB (MSA)          |
 | mT5-base         |     768      |      12     |      12       |  250K |TPUv3-32   |        1M     |  1024  | 8.0x                  |6.3T tokens (mC4)|
 | ArabicT5-17GB-small   |     512      |      8     |      20      |  32K  |TPUv3-32   |       256K    |  256   | 0.5x                 |17GB (MSA)          |
+| ArabicT5-49GB-small   |     512      |      8     |      16      |  32K  |TPUv3-64   |       500K    |  256   | 1.0x                 |49GB (MSA + OSCAR)          |
 | ArabicT5-17GB-base    |     768      |      12     |      16       |  32K  |TPUv3-128  |       500K    |  512   | 2.0x                  |17GB (MSA)          |
+| ArabicT5-49GB-base    |     768      |      12     |      16       |  32K  |TPUv3-64  |       500K    |  256   | 1.0x                  |49GB (MSA + OSCAR)          |
 | ArabicT5-17GB-large  |     768      |      12     |      36       |  32K  |TPUv3-128  |       500K    |  512   | 2.0x                  |17GB (MSA)          |
 | mT5-base             |  <center>72.2/84.1  |<center>96.2|<center>67.3/68.8|<center>52.2|<center>25.7|
 | AraBART-base         |  <center>48.8/71.2  |<center>96.1|<center>66.2/68.2|<center>56.3|<center>31.2|
 | ArabicT5-17GB-small        |  <center>70.8/84.8  |<center>96.4|<center>68.9/71.2|<center>58.9|<center>29.2|
+| ArabicT5-49GB-small        |  <center>72.4/85.1  |<center>96.4|<center>70.2/73.4|<center>61.0|<center>30.2|
 | ArabicT5-17GB-base       |  <center>73.3/86.1  |<center>96.4|<center>70.4/73.0|<center>59.8|<center>30.3|
+| ArabicT5-49GB-base       |  <center>72.1/85.1  |<center>96.5|<center>71.3/74.1|<center>60.4|<center>30.9|
 | ArabicT5-17GB-large      |  <center>**75.5/87.1**  |<center>**96.5**| <center>**72.2/75.2**|<center>**61.7**|<center>**31.7**|
 Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic), XL-SUM (Rouge-L with Stemmer).
 For the XL-Sum task, we choose our best run for each model using the eval set. We use the official evaluation script from XL-Sum, which uses the stemmer function, which may show better results than papers that don't use the stemmer function. The official XL-Sum paper uses a stemmer function.
+# Continual Pre-Training of ArabicT5 with T5x
+if you want to continue pre-training ArabicT5 on your own data, we have uploaded the raw t5x checkpoint to this link https://huggingface.co/sultan/ArabicT5-49GB-base/blob/main/arabict5_49GB_base_t5x.tar.gz
+We will soon share a tutorial on how you can do that for free with Kaggle TPU
 # FineTuning our ArabicT5 model on generative and abstractive tasks with FLAX ###
 # Acknowledgment
+We want to acknowledge the support we have from The TPU Research Cloud (TRC) team to grant us access to TPUv3 units.
+# Paper
+[Generative Approach for Gender-Rewriting Task with ArabicT5](https://aclanthology.org/2022.wanlp-1.55/)
 # Citation