--- license: mit datasets: - HausaNLP/NaijaSenti-Twitter language: - ha metrics: - accuracy - f1 - precision - recall base_model: google-bert/bert-base-cased pipeline_tag: text-classification library_name: transformers tags: - NLP - sentiment-analysis - hausa --- **Model Name**: Hausa Sentiment Analysis **Model ID**: `Kumshe/Hausa-sentiment-analysis` **Language**: Hausa --- ### **Model Description** This model is a BERT-based model fine-tuned for sentiment analysis in the Hausa language. It is trained to classify social media text into different sentiment categories: positive, negative, or neutral. ### **Intended Use** - **Primary Use Case**: Sentiment analysis for Hausa social media content, such as tweets or Facebook posts. - **Target Users**: NLP researchers, businesses analyzing social media, and developers building sentiment analysis tools for Hausa language content. - **Example Usage**: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load the model and tokenizer tokenizer = AutoTokenizer.from_pretrained("Kumshe/Hausa-sentiment-analysis") model = AutoModelForSequenceClassification.from_pretrained("Kumshe/Hausa-sentiment-analysis") # Encode the input text inputs = tokenizer("Your Hausa text here", return_tensors="pt") # Get model predictions outputs = model(**inputs) ``` ### **Model Architecture** - **Base Model**: BERT (Bidirectional Encoder Representations from Transformers) - **Pre-trained Model**: `bert-base-cased` from Hugging Face Transformers library. - **Fine-Tuned Model**: Fine-tuned for 40 epochs on a Hausa sentiment dataset. ### **Training Data** - **Data Source**: The model was trained on a dataset containing 35,000 examples from social media platforms such as Twitter and Facebook. - **Data Split**: - **Training Set**: 80% of the data - **Validation Set**: 20% of the data ### **Training Details** - **Number of Epochs**: 40 - **Batch Size**: - Per device training batch size: 32 - Per device evaluation batch size: 64 - **Learning Rate Schedule**: Warm-up steps: 10, Weight decay: 0.01 - **Optimizer**: AdamW - **Training Hardware**: Trained on Kaggle using 2 NVIDIA T4 GPUs. ### **Evaluation Metrics** - **Evaluation Loss**: 0.6265 - **Accuracy**: 73.47% - **F1 Score**: 73.47% - **Precision**: 73.54% - **Recall**: 73.47% ### **Model Performance** The model performs well on the given dataset, achieving a balanced performance between precision, recall, and F1 score, making it suitable for general sentiment analysis tasks in Hausa language text. ### **Limitations** - The model may not generalize well to other types of Hausa text outside of social media (e.g., formal writing or literature). - Performance may degrade on text containing slang or regional dialects not well-represented in the training data. - The model is biased towards the examples in the training dataset; biases in the data may affect predictions. ### **Ethical Considerations** - Sentiment analysis models can potentially amplify biases present in the training data. - Use cautiously in sensitive applications to avoid unintended consequences. - Consider the impact on privacy and data protection laws, especially when analyzing social media content. ### **License** - ### **Citation** If you use this model in your work, please cite it as follows: ``` @misc{Kumshe2024HausaSentimentAnalysis, author = {Umar Muhammad Mustapha Kumshe}, title = {Hausa Sentiment Analysis}, year = {2024}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/Kumshe/Hausa-sentiment-analysis}}, } ``` ### **Contributions** This model was fine-tuned by Umar Muhammad Mustapha Kumshe. Feel free to contribute, provide feedback, or raise issues on the [model repository](https://huggingface.co/Kumshe/Hausa-sentiment-analysis).