DistilRoBERTa (base) Middle High German Charter Masked Language Model
This model is a fine-tuned version of distilroberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.
Model description
Please refer this model together with to the distilroberta (base-sized model) card or the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Sanh et al. for additional information.
Intended uses & limitations
This model can be used for sequence prediction tasks, i.e., fill-masks.
Training and evaluation data
The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a NVIDIA GeForce GTX 1660 Ti 6GB GPU.
Training hyperparameters
The following hyperparameters were used during training:
- num_train_epochs: 10
- learning_rate: 2e-5
- weight-decay: 0,01
- train_batch_size: 8
- eval_batch_size: 8
- num_proc: 4
- block_size: 256
Training results
Epoch | Training Loss | Validation Loss |
---|---|---|
1 | 2.537000 | 2.112094 |
2 | 2.053400 | 1.838937 |
3 | 1.900300 | 1.706654 |
4 | 1.766200 | 1.607970 |
5 | 1.669200 | 1.532340 |
6 | 1.619100 | 1.490333 |
7 | 1.571300 | 1.476035 |
8 | 1.543100 | 1.428958 |
9 | 1.517100 | 1.423216 |
10 | 1.508300 | 1.408235 |
Perplexity: 4.07
Updates
- 2023-03-30: Upload
Citation
Please cite as follows when using this model.
@misc{distilroberta-base-mhg-charter-mlm,
title={distilroberta-base-mhg-charter-mlm},
author={Atzenhofer-Baumgartner, Florian},
year = { 2023 },
url = { https://huggingface.co/atzenhofer/distilroberta-base-mhg-charter-mlm },
publisher = { Hugging Face }
}
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.