Dataset
The following Turkish dataset is used for fine-tuning https://huggingface.co/datasets/maydogan/TRSAv1
TRSAv1 (Turkish Sentiment Analysis Version 1) Dataset This data set has been produced to contribute to Turkish NLP studies. The dataset consists of a total of 150 thousand samples, 50 thousand negative, 50 thousand positive and 50 thousand neutral. It can be used in text classification and sentiment analysis studies by citing the related study.
Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Marzu39/bert-turkish-text-classification")
model = AutoModelForSequenceClassification.from_pretrained("Marzu39/bert-turkish-text-classification")
Training hyperparameters
training_args = TrainingArguments(
do_train=True,
do_eval=True,
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=16,
warmup_steps=100,
weight_decay=0.01,
logging_strategy='steps',
logging_steps=50,
evaluation_strategy="epoch",
eval_steps=50,
save_strategy="epoch",
fp16=False,
load_best_model_at_end=True
)
Citation
Please cite the following paper if needed
@article{arzu2023turkcce,
title={T{\"u}rk{\c{c}}e Duygu S{\i}n{\i}fland{\i}rma {\.I}{\c{c}}in Transformers Tabanl{\i} Mimarilerin Kar{\c{s}}{\i}la{\c{s}}t{\i}r{\i}lmal{\i} Analizi},
author={Arzu, Mehmet and Aydo{\u{g}}an, Murat},
journal={Computer Science},
number={IDAP-2023},
pages={1--6},
year={2023},
publisher={Ali KARCI}
}
Summary
Sentiment classification based on Transformers is a topic that has recently been widely studied in natural language processing and machine learning. There are many areas where it can be used, such as the interpretation and classification of emotional expressions in texts, social media analysis, market research, user experiences, etc. For this reason, this study aims to realize sentiment classification using Transformers based architectures.
- Downloads last month
- 6