Identifying and Analysing political quotes from the Danish Parliament related to climate change using NLP
KlimaBERT, a sequence-classifier fine-tuned to predict whether political quotes are climate-related. When predicting the positive class 1, "climate-related", the model achieves a F1-score of 0.97, Precision of 0.97, and Recall of 0.97. The negative class, 0, is defined as "non-climate-related".
KlimaBERT is fine-tuned using the pre-trained DaBERT-uncased model, on a training set of 1.000 manually labelled data-points. The training set contains both political quotes and summaries of bills from the Danish Parliament.
The model is created to identify political quotes related to climate change, and performs best on official texts from the Danish Parliament.
Fine-tuning
To fine-tune a model similar to KlimaBERT, follow the fine-tuning notebooks
References
BERT: Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805
DaBERT: Certainly (2021). Certainly has trained the most advanced danish bert model to date. https://www.certainly.io/blog/danish-bert-model/.
Acknowledgements
The resources are created through the work of my Master's thesis, so I would like to thank my supervisors Leon Derczynski and Vedran Sekara for the great support throughout the project! And a HUGE thanks to Gustav Gyrst for great sparring and co-development of the tools you find in this repo.
Contact
For any further help, questions, comments etc. feel free to contact the author Jonathan Kristensen on LinedIn or by creating a "discussion" on this model's page.
- Downloads last month
- 26