Ancient Greek BERT
The first and only available Ancient Greek sub-word BERT model!
State-of-the-art post fine-tuning on Part-of-Speech Tagging and Morphological Analysis.
Pre-trained weights are made available for a standard 12 layer, 768d BERT-base model.
Further scripts for using the model and fine-tuning it for PoS Tagging are available on our Github repository!
Please refer to our paper titled: "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek". In Proceedings of The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021)
How to use
Requirements:
pip install transformers
pip install unicodedata
pip install flair
Can be directly used from the HuggingFace Model Hub with:
from transformers import AutoTokenizer, AutoModel
tokeniser = AutoTokenizer.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
model = AutoModel.from_pretrained("pranaydeeps/Ancient-Greek-BERT")
Fine-tuning for POS/Morphological Analysis
Please refer the GitHub repository for the code and details regarding fine-tuning
Training data
The model was initialised from AUEB NLP Group's Greek BERT and subsequently trained on monolingual data from the First1KGreek Project, Perseus Digital Library, PROIEL Treebank and Gorman's Treebank
Training and Eval details
Standard de-accentuating and lower-casing for Greek as suggested in AUEB NLP Group's Greek BERT The model was trained on 4 NVIDIA Tesla V100 16GB GPUs for 80 epochs, with a max-seq-len of 512 and results in a perplexity of 4.8 on the held out test set. It also gives state-of-the-art results when fine-tuned for PoS Tagging and Morphological Analysis on all 3 treebanks averaging >90% accuracy. Please consult our paper or contact me for further questions!
Cite
If you end up using Ancient-Greek-BERT in your research, please cite the paper:
@inproceedings{ancient-greek-bert,
author = {Singh, Pranaydeep and Rutten, Gorik and Lefever, Els},
title = {A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek},
year = {2021},
booktitle = {The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2021)}
}
- Downloads last month
- 422