File size: 2,866 Bytes
44095cf 1941505 cabb2bb 1941505 b21073e cabb2bb 4f63b68 b21073e cabb2bb b21073e 4f63b68 10b2275 cabb2bb b21073e cabb2bb 44095cf 4f63b68 b21073e 4f63b68 44095cf cabb2bb b21073e 4f63b68 44095cf b21073e 44095cf 4f63b68 44095cf cabb2bb 4f63b68 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: apache-2.0
language:
- am
- ti
- ha
- aa
base_model:
- Hailay/EXLMR
- FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
---
---
## 1. Model Description
**Hailay/FT_EXLMR** is a fine-tuned version of the **EXLMR** model, designed specifically for sentiment analysis and text classification tasks in low-resource African languages such as Tigrinya, Amharic, and Oromo. This model leverages the architecture of EXLMR but has been further fine-tuned to improve its performance on multilingual tasks, especially for languages not widely represented in existing NLP models.
The model was trained using the AfriSent-Semeval-2023 dataset, a benchmark dataset for African languages, which is publicly available on GitHub:[AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023)
## 2.Intended Use
This model is ideal for:
Researchers and developers who are working on multilingual sentiment analysis in African languages.
Applications that require text classification in low-resource languages.
It is designed specifically for tasks such as:
Sentiment analysis
Text classification
**Note:** Without further fine-tuning, the model is unsuitable for tasks like machine translation or named entity recognition.
## 3.Training Data
The **Hailay/FT_EXLMR** model was trained using the dataset from the
**SemEval 2023 Shared Task 12: Sentiment Analysis in African Languages (AfriSenti-SemEval)**.
This dataset comprises sentiment-labeled text from 14 African languages:
1. Algerian Arabic (arq) - Algeria
2. Amharic (ama) - Ethiopia
3. Hausa (hau) - Nigeria
4. Igbo (ibo) - Nigeria
5. Kinyarwanda (kin) - Rwanda
6. Moroccan Arabic/Darija (ary) - Morocco
7. Mozambique Portuguese (pt-MZ) - Mozambique
8. Nigerian Pidgin (pcm) - Nigeria
9. Oromo (orm) - Ethiopia
10. Swahili (swa) - Kenya/Tanzania
11. Tigrinya (tir) - Ethiopia
12. Twi (twi) - Ghana
13. Xithonga (tso) - Mozambique
14. Yoruba (yor) - Nigeria
The dataset covers diverse data for training multilingual models like **Hailay/FT_EXLMR**
We access the dataset from [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023).
The **Hailay/FT_EXLMR** model was trained using the following configuration:
Epochs: 3
Learning Rate: 1e-5
Optimizer: AdamW
Batch Size: 16
## 4. Evaluation
The model was evaluated using accuracy and loss as the primary metrics. The results are as follows:
Accuracy: Achieved strong performance on Tigrinya, Amharic, Afar, and Oromo text classification and sentiment analysis tasks.
Loss: Loss values showed steady convergence during the 3 epochs of training, reflecting a well-calibrated model.
The evaluation was carried out on the test set provided in the [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023) dataset. |