cabrooks commited on
Commit
1667f15
1 Parent(s): fe8ccd3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -3
README.md CHANGED
@@ -1,3 +1,45 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Logion: Machine Learning for Greek Philology
2
+
3
+ The most advanced Ancient Greek BERT model trained to date! Read the paper on [arxiv](https://arxiv.org/abs/2305.01099) by Charlie Cowen-Breen, Creston Brooks, Johannes Haubold, and Barbara Graziosi.
4
+
5
+ Originally based on the pre-trained weights and tokenizer made available by Pranaydeep Singh's [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT), we train on a corpus of over 70 million words of premodern Greek.
6
+
7
+ Further information on this project and code for beam-searching over multiple masked tokens can be found on [GitHub](https://github.com/charliecb/Logion).
8
+
9
+ We're adding more models trained with cleaner data and different tokenizations - keep an eye out!
10
+
11
+ ## How to use
12
+
13
+ Requirements:
14
+
15
+ ```python
16
+ pip install transformers
17
+ ```
18
+
19
+ Load the model and tokenizer directly from the HuggingFace Model Hub:
20
+
21
+
22
+ ```python
23
+ from transformers import BertTokenizer, BertForMaskedLM
24
+ tokenizer = BertTokenizer.from_pretrained("cabrooks/LOGION-base")
25
+ model = BertForMaskedLM.from_pretrained("cabrooks/LOGION-base")
26
+ ```
27
+
28
+ ## Model pre-training and tokenizer
29
+
30
+ The model was initialized from Pranaydeep Singh's [Ancient Greek BERT](https://huggingface.co/pranaydeeps/Ancient-Greek-BERT), which itself used a [Modern Greek BERT](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) as pre-training.
31
+ Singh's Ancient Greek BERT was trained on data pulled from First1KGreek Project, Perseus Digital Library, PROIEL Treebank, and Gorman's Treebank. We train futher on over 70 million words of premodern Greek, which we are happy to make available upon request. For more information, please see footnote 2 on the [arxiv paper](https://arxiv.org/abs/2305.01099). Please also refer to this paper for details on training and evaluation.
32
+
33
+
34
+ ## Cite
35
+
36
+ If you use this model in your research, please cite the paper:
37
+
38
+ ```
39
+ @inproceedings{logion-base,
40
+ author = {Cowen-Breen, Charlie and Brooks, Creston and Haubold, Johannes and Graziosi, Barbara},
41
+ title = {Logion: Machine Learning for Greek Philology},
42
+ year = {2023},
43
+ url = {https://arxiv.org/abs/2305.01099}
44
+ }
45
+ ```