File size: 1,034 Bytes
f824ccf
 
 
 
 
 
 
1
2
3
4
5
6
7
KazSAnDRA is a dataset developed for Kazakh sentiment analysis, representing the first and most extensive publicly available resource in this field. This comprehensive dataset includes 180,064 reviews obtained from a variety of sources, supplemented with numerical ratings from 1 to 5 to quantitatively capture customer sentiments. The project also focused on automating Kazakh sentiment classification by developing and evaluating four different machine learning models. These models were trained for both polarity classification and score classification, with performance assessed under balanced and imbalanced conditions. The most effective model achieved an F1-score of 0.81 for polarity classification and 0.39 for score classification on test datasets.

The dataset and fine-tuned models are open access and available for download under the Creative Commons Attribution 4.0 International License (CC BY 4.0) through our GitHub repository.

DOI: https://doi.org/10.48550/arXiv.2403.19335

Data: https://github.com/IS2AI/KazSAnDRA