cook
/

cicero-similis

Inference Endpoints

Model card Files Files and versions Community

cicero-similis / README.md

todd-cook

updated for paper publication

971409a almost 3 years ago

|

history blame contribute delete

1.88 kB

	---
	language:
	- la
	tags:
	- language model
	license: apache-2.0
	datasets:
	- Tesserae
	- Phi5
	- Thomas Aquinas
	- Patrologia Latina
	---

	# Cicero-Similis

	## Model description

	A Latin Language Model, trained on Latin texts, and evaluated using the corpus of Cicero, as described in the paper _What Would Cicero Write? -- Examining Critical Textual Decisions with a Language Model_ by Todd Cook,
	Published in Ciceroniana On Line, Vol. V, #2.

	## Intended uses & limitations

	#### How to use

	Normalize text using JV Replacement and tokenize using CLTK to separate enclitics such as "-que", then:

	```
	from transformers import BertForMaskedLM, AutoTokenizer, FillMaskPipeline
	tokenizer = AutoTokenizer.from_pretrained("cook/cicero-similis")
	model = BertForMaskedLM.from_pretrained("cook/cicero-similis")
	fill_mask = FillMaskPipeline(model=model, tokenizer=tokenizer, top_k=10_000)
	# Cicero, De Re Publica, VI, 32, 2
	# "animal" is found in A, Q, PhD manuscripts
	# 'anima' H^1 Macr. et codd. Tusc.
	results = fill_mask("inanimum est enim omne quod pulsu agitatur externo; quod autem est [MASK],")
	```

	#### Limitations and bias

	Currently the model training data excludes modern and 19th century texts, but that weakness is the model's strength; it's not aimed to be a one-size-fits-all model.

	## Training data

	Trained on the corpora Phi5, Tesserae, Thomas Aquinas, and Patrologes Latina.


	## Training procedure

	5 epochs, masked language modeling .15, effective batch size 32


	## Eval results
	A novel evaluation metric is proposed in the paper _What Would Cicero Write? -- Examining Critical Textual Decisions with a Language Model_ by Todd Cook,
	Published in Ciceroniana On Line, Vol. V, #2.

	### BibTeX entry and citation info
	TODO
	_What Would Cicero Write? -- Examining Critical Textual Decisions with a Language Model_ by Todd Cook,
	Published in Ciceroniana On Line, Vol. V, #2.