--- license: cc-by-sa-4.0 datasets: - sinhala-nlp/NSINA - sinhala-nlp/NSINA-Media language: - si --- # Sinhala News Media Identification This is a text classification task created with the [NSINA dataset](https://github.com/Sinhala-NLP/NSINA). This dataset is also released with the same license as NSINA. ## Data Data can be loaded into pandas dataframes using the following code. ```python from datasets import Dataset from datasets import load_dataset train = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Media', split='train')) test = Dataset.to_pandas(load_dataset('sinhala-nlp/NSINA-Media', split='test')) ``` ## Citation If you are using the dataset or the models, please cite the following paper. ~~~ @inproceedings{Nsina2024, author={Hettiarachchi, Hansi and Premasiri, Damith and Uyangodage, Lasitha and Ranasinghe, Tharindu}, title={{NSINA: A News Corpus for Sinhala}}, booktitle={The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, year={2024}, month={May}, } ~~~