Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,45 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Logion: Machine Learning for Greek Philology
|
2 |
+
|
3 |
+
The most advanced Ancient Greek BERT model trained to date! Read the paper on [arxiv](https://arxiv.org/abs/2305.01099) by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi.
|
4 |
+
|
5 |
+
Originally based on the pre-trained weights and tokenizer made available by Pranaydeep Singh's [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT), we train on a corpus of over 70 million words of premodern Greek.
|
6 |
+
|
7 |
+
Further information on this project and code for beam-searching over multiple masked tokens can be found on [GitHub](https://github.com/charliecb/Logion).
|
8 |
+
|
9 |
+
We're adding more models trained with cleaner data and different tokenizations - keep an eye out!
|
10 |
+
|
11 |
+
## How to use
|
12 |
+
|
13 |
+
Requirements:
|
14 |
+
|
15 |
+
```python
|
16 |
+
pip install transformers
|
17 |
+
```
|
18 |
+
|
19 |
+
Load the model and tokenizer directly from the HuggingFace Model Hub:
|
20 |
+
|
21 |
+
|
22 |
+
```python
|
23 |
+
from transformers import BertTokenizer, BertForMaskedLM
|
24 |
+
tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-base")
|
25 |
+
model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-base")
|
26 |
+
```
|
27 |
+
|
28 |
+
## Model pre-training and tokenizer
|
29 |
+
|
30 |
+
The model was initialized from Pranaydeep Singh's [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT), which itself used a [Modern Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) as pre-training.
|
31 |
+
Singh's Ancient Greek BERT was trained on data pulled from First1KGreek Project, Perseus Digital Library, PROIEL Treebank, and Gorman's Treebank. We train futher on over 70 million words of premodern Greek, which we are happy to make available upon request. For more information, please see footnote 2 on the [arxiv paper](https://arxiv.org/abs/2305.01099). Please also refer to this paper for details on training and evaluation.
|
32 |
+
|
33 |
+
|
34 |
+
## Cite
|
35 |
+
|
36 |
+
If you use this model in your research, please cite the paper:
|
37 |
+
|
38 |
+
```
|
39 |
+
@inproceedings{logion-base,
|
40 |
+
author = {Cowen-Breen, Charlie and Brooks, Creston and Haubold, Johannes and Graziosi, Barbara},
|
41 |
+
title = {Logion: Machine Learning for Greek Philology},
|
42 |
+
year = {2023},
|
43 |
+
url = {https://arxiv.org/abs/2305.01099}
|
44 |
+
}
|
45 |
+
```
|