aristoBERTo

aristoBERTo is a transformer model for ancient Greek, a low resource language. We initialized the pre-training with weights from GreekBERT, a Greek version of BERT which was trained on a large corpus of modern Greek (~ 30 GB of texts). We continued the pre-training with an ancient Greek corpus of about 900 MB, which was scrapped from the web and post-processed. Duplicate texts and editorial punctuation were removed.

Applied to the processing of ancient Greek, aristoBERTo outperforms xlm-roberta-base and mdeberta in most downstream tasks like the labeling of POS, MORPH, DEP and LEMMA.

aristoBERTo is provided by the Diogenet project of the University of California, San Diego.

Intended uses

This model was created for fine-tuning with spaCy and the ancient Greek Universal Dependency datasets as well as a NER corpus produced by the Diogenet project. As a fill-mask model, AristoBERTo can also be used in the restoration of damaged Greek papyri, inscriptions, and manuscripts.

It achieves the following results on the evaluation set:

Loss: 1.6323

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
1.377	20.0	3414220	1.6314

Framework versions

Transformers 4.14.0.dev0
Pytorch 1.10.0+cu102
Datasets 1.16.1
Tokenizers 0.10.3