Update README.md
Browse files
README.md
CHANGED
@@ -20,19 +20,23 @@ This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Ma
|
|
20 |
|
21 |
## Results on TyDi QA, HARD, Sentiment Analysis, Sarcasm Detection ( Best Score is highlighted in bold )
|
22 |
|
23 |
-
| Model | <center>TyDi QA
|
24 |
-
|
25 |
-
| AraT5-Base | <center>70.36/84.21 |<center>96.49|<center>69.7/72.63|<center>60.44|
|
26 |
-
| AraT5-Base-MSA | <center>70.90/84.00 |<center>**96.52**|<center>70.03/72.73|<center>60.69|
|
27 |
-
| AraT5-Base-Tweets | <center>65.14/79.00 |<center>96.26|<center>70.67/73.52|<center>61.11|
|
28 |
-
| mT5-Base | <center>72.20/84.13 |<center>96.24|<center>67.33/68.78|<center>52.18|
|
29 |
-
| ArabicT5-Base | <center>70.79/84.76 |<center>96.36|<center>68.93/71.20|<center>58.93|
|
30 |
-
| ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|
|
31 |
-
| ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|
|
32 |
-
|
33 |
-
Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic).
|
34 |
-
|
35 |
-
You can download the full details of our grid search for all models in all tasks above from this link
|
|
|
|
|
|
|
|
|
36 |
|
37 |
# Speedup Results
|
38 |
|
|
|
20 |
|
21 |
## Results on TyDi QA, HARD, Sentiment Analysis, Sarcasm Detection ( Best Score is highlighted in bold )
|
22 |
|
23 |
+
| Model | <center>TyDi QA| <center>HARD| <center>ArSarcasm-v2-Sentiment| <center>ArSarcasm-v2-Sarcasm| XL-SUM |
|
24 |
+
|----------------------|---------------|---------------------|-------------------------------------|----------------------------------|----------------------------------
|
25 |
+
| AraT5-Base | <center>70.36/84.21 |<center>96.49|<center>69.7/72.63|<center>60.44|<center>30.31|
|
26 |
+
| AraT5-Base-MSA | <center>70.90/84.00 |<center>**96.52**|<center>70.03/72.73|<center>60.69|<center>27.36|
|
27 |
+
| AraT5-Base-Tweets | <center>65.14/79.00 |<center>96.26|<center>70.67/73.52|<center>61.11|<center>25.08|
|
28 |
+
| mT5-Base | <center>72.20/84.13 |<center>96.24|<center>67.33/68.78|<center>52.18|<center>25.68|
|
29 |
+
| ArabicT5-Base | <center>70.79/84.76 |<center>96.36|<center>68.93/71.20|<center>58.93|<center>29.19|
|
30 |
+
| ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|<center>30.30|
|
31 |
+
| ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|<center>**31.70**|
|
32 |
+
|
33 |
+
Evaluation Metrics: TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic), XL-SUM (Rouge-L with Stemmer).
|
34 |
+
|
35 |
+
You can download the full details of our grid search for all models in all tasks above from this link: https://github.com/salrowili/ArabicT5/raw/main/ArabicT5_Grid_Search.zip
|
36 |
+
|
37 |
+
For the XL-Sum task, we choose our best run for each model using the eval set. We use the official evaluation script from XL-Sum, which uses the stemmer function, which may show better results than papers that don't use the stemmer function. The official XL-Sum paper uses a stemmer function.
|
38 |
+
|
39 |
+
In our XL-Sum results, although we show that AraT5-Base exceeded our ArabicT5-Large, in most runs, our ArabicT5-Large shows better results, as you can see from our grid search file.
|
40 |
|
41 |
# Speedup Results
|
42 |
|