File size: 2,866 Bytes
44095cf
1941505
 
 
 
cabb2bb
 
 
 
 
 
1941505
 
b21073e
cabb2bb
4f63b68
 
b21073e
cabb2bb
b21073e
4f63b68
 
 
 
10b2275
cabb2bb
b21073e
cabb2bb
 
 
 
44095cf
4f63b68
 
 
 
 
 
 
 
 
 
b21073e
 
4f63b68
 
44095cf
cabb2bb
 
b21073e
4f63b68
 
 
 
44095cf
b21073e
44095cf
4f63b68
44095cf
cabb2bb
 
4f63b68
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
license: apache-2.0
language:
- am
- ti
- ha
- aa
base_model:
- Hailay/EXLMR
- FacebookAI/xlm-roberta-base
pipeline_tag: text-classification
---
---
## 1. Model Description
**Hailay/FT_EXLMR** is a fine-tuned version of the **EXLMR** model, designed specifically for sentiment analysis and text classification tasks in low-resource African languages such as Tigrinya, Amharic, and Oromo. This model leverages the architecture of EXLMR but has been further fine-tuned to improve its performance on multilingual tasks, especially for languages not widely represented in existing NLP models.
The model was trained using the AfriSent-Semeval-2023 dataset, a benchmark dataset for African languages, which is publicly available on GitHub:[AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023)

## 2.Intended Use
This model is ideal for: 
Researchers and developers who are working on multilingual sentiment analysis in African languages.
Applications that require text classification in low-resource languages.
It is designed specifically for tasks such as:
Sentiment analysis
Text classification

**Note:** Without further fine-tuning, the model is unsuitable for tasks like machine translation or named entity recognition.

## 3.Training Data
The **Hailay/FT_EXLMR** model was trained using the dataset from the
**SemEval 2023 Shared Task 12: Sentiment Analysis in African Languages (AfriSenti-SemEval)**.
This dataset comprises sentiment-labeled text from 14 African languages:

1. Algerian Arabic (arq) - Algeria
2. Amharic (ama) - Ethiopia
3. Hausa (hau) - Nigeria
4. Igbo (ibo) - Nigeria
5. Kinyarwanda (kin) - Rwanda
6. Moroccan Arabic/Darija (ary) - Morocco
7. Mozambique Portuguese (pt-MZ) - Mozambique
8. Nigerian Pidgin (pcm) - Nigeria
9. Oromo (orm) - Ethiopia
10. Swahili (swa) - Kenya/Tanzania
11. Tigrinya (tir) - Ethiopia 
12. Twi (twi) -       Ghana
13. Xithonga (tso) - Mozambique
14. Yoruba (yor) - Nigeria

The dataset covers diverse data for training multilingual models like **Hailay/FT_EXLMR**
We access the dataset from  [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023).
The **Hailay/FT_EXLMR** model was trained using the following configuration:
Epochs: 3
Learning Rate: 1e-5
Optimizer: AdamW
Batch Size: 16

## 4. Evaluation

The model was evaluated using accuracy and loss as the primary metrics. The results are as follows:

Accuracy: Achieved strong performance on Tigrinya, Amharic, Afar, and Oromo text classification and sentiment analysis tasks.

Loss: Loss values showed steady convergence during the 3 epochs of training, reflecting a well-calibrated model.
The evaluation was carried out on the test set provided in the [AfriSent-Semeval-2023 GitHub Repository](https://github.com/afrisenti-semeval/afrisent-semeval-2023) dataset.