language:
- pl
- en
pipeline_tag: text-classification
widget:
- text: TRUMP needs undecided voters
example_title: example 1
- text: Oczywiście ze Pan Prezydent to nasza duma narodowa!!
example_title: example 2
tags:
- text
- sentiment
- politics
- text-classification
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: sentimenTw-political
results:
- task:
type: text-classification
name: Text Classification
dataset:
type: social media
name: politics
metrics:
- type: f1 macro
value: 71.2
- type: accuracy
value: 74
eevvgg/sentimenTw-political
This model is a fine-tuned version of multilingual model cardiffnlp/twitter-xlm-roberta-base-sentiment. Classification of text sentiment into 3 categories: negative, neutral, positive. Fine-tuned on a 2k sample of manually annotated Reddit (EN) and Twitter (PL) data.
Developed by: Ewelina Gajewska as a part of ComPathos project: https://www.ncn.gov.pl/sites/default/files/listy-rankingowe/2020-09-30apsv2/streszczenia/497124-en.pdf
Model type: RoBERTa for sentiment classification
Language(s) (NLP): Multilingual; finetuned on 1k English text from Reddit and 1k Polish tweets
License: [More Information Needed]
Finetuned from model: cardiffnlp/twitter-xlm-roberta-base-sentiment
Uses
Sentiment classification in multilingual data. Fine-tuned on a 2k English and Polish sample of social media texts from political domain. Model suited for short text (up to 200 tokens) .
How to Get Started with the Model
from transformers import pipeline
model_path = "eevvgg/sentimenTw-political"
sentiment_task = pipeline(task = "text-classification", model = model_path, tokenizer = model_path)
sequence = ["TRUMP needs undecided voters",
"Oczywiście ze Pan Prezydent to nasza duma narodowa!!"]
result = sentiment_task(sequence)
labels = [i['label'] for i in result] # ['neutral', 'positive']
Model Sources
- Repository: Colab notebook
- Paper: TBA
- BibTex citation:
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
Training Details
- Trained for 3 epochs, mini-batch size of 8.
- Training results: loss: 0.515
- See details in Colab notebook
Preprocessing
- Hyperlinks and user mentions (@) normalization to "http" and "@user" tokens, respectively. Removal of extra spaces.
Speeds, Sizes, Times
- See Colab notebook
Evaluation
- Evaluation run on a sample of 200 texts (10% of data).
Results
- accuracy: 74.0
- macro avg:
- f1: 71.2
- precision: 72.8
- recall: 70.8
- weighted avg:
f1: 73.3
precision: 74.0
recall: 74.0
precision recall f1-score support negative 0.752 0.901 0.820 91 neutral 0.764 0.592 0.667 71 positive 0.667 0.632 0.649 38
Citation
BibTeX:
@misc{SentimenTwGK2023,
author={Gajewska, Ewelina and Konat, Barbara},
title={SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media},
year={2023},
howpublished = {\url{https://huggingface.co/eevvgg/sentimenTw-political}},
}
APA:
Gajewska, E., & Konat, B. (2023).
SentimenTw XLM-RoBERTa-base Model for Multilingual Sentiment Classification on Social Media.
https://huggingface.co/eevvgg/sentimenTw-political.