|
--- |
|
language: |
|
- la |
|
tags: |
|
- language model |
|
license: apache-2.0 |
|
datasets: |
|
- Tesserae |
|
- Phi5 |
|
- Thomas Aquinas |
|
- Patrologia Latina |
|
--- |
|
|
|
# Cicero-Similis |
|
|
|
## Model description |
|
|
|
A Latin Language Model, trained on Latin texts, and evaluated using the corpus of Cicero, as described in the paper _What Would Cicero Write? -- Examining Critical Textual Decisions with a Language Model_ by Todd Cook, |
|
Published in Ciceroniana On Line, Vol. V, #2. |
|
|
|
## Intended uses & limitations |
|
|
|
#### How to use |
|
|
|
Normalize text using JV Replacement and tokenize using CLTK to separate enclitics such as "-que", then: |
|
|
|
``` |
|
from transformers import BertForMaskedLM, AutoTokenizer, FillMaskPipeline |
|
tokenizer = AutoTokenizer.from_pretrained("cook/cicero-similis") |
|
model = BertForMaskedLM.from_pretrained("cook/cicero-similis") |
|
fill_mask = FillMaskPipeline(model=model, tokenizer=tokenizer, top_k=10_000) |
|
# Cicero, De Re Publica, VI, 32, 2 |
|
# "animal" is found in A, Q, PhD manuscripts |
|
# 'anima' H^1 Macr. et codd. Tusc. |
|
results = fill_mask("inanimum est enim omne quod pulsu agitatur externo; quod autem est [MASK],") |
|
``` |
|
|
|
#### Limitations and bias |
|
|
|
Currently the model training data excludes modern and 19th century texts, but that weakness is the model's strength; it's not aimed to be a one-size-fits-all model. |
|
|
|
## Training data |
|
|
|
Trained on the corpora Phi5, Tesserae, Thomas Aquinas, and Patrologes Latina. |
|
|
|
|
|
## Training procedure |
|
|
|
5 epochs, masked language modeling .15, effective batch size 32 |
|
|
|
|
|
## Eval results |
|
A novel evaluation metric is proposed in the paper _What Would Cicero Write? -- Examining Critical Textual Decisions with a Language Model_ by Todd Cook, |
|
Published in Ciceroniana On Line, Vol. V, #2. |
|
|
|
### BibTeX entry and citation info |
|
TODO |
|
_What Would Cicero Write? -- Examining Critical Textual Decisions with a Language Model_ by Todd Cook, |
|
Published in Ciceroniana On Line, Vol. V, #2. |