ancatmara
/

historical-irish-tokenizer-sentencepiece

Inference Endpoints

Model card Files Files and versions Community

ancatmara commited on Aug 14

Commit

ed43313

•

1 Parent(s): 4ba6318

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -2,10 +2,11 @@
 license: cc-by-nc-sa-4.0
 language:
 - ga
 library_name: transformers
-tags:
-- tokenizer
-- irish
 ---
 **Historical Irish SentencePiece tokenizer** was trained on Old, Middle, Early Modern, Classical Modern and pre-reform Modern Irish texts from St. Gall Glosses, Würzburg Glosses, CELT and the book subcorpus Historical Irish Corpus. The training data spans ca. 550 — 1926 and covers a wide variety of genres, such as bardic poetry, native Irish stories, translations and adaptations of continental epic and romance, annals, genealogies, grammatical and medical tracts, diaries, and religious writing. Due to code-switching in some texts, the model has some Latin in the vocabulary.

 license: cc-by-nc-sa-4.0
 language:
 - ga
+- sga
+- mga
+- ghc
+- la
 library_name: transformers
 ---
 **Historical Irish SentencePiece tokenizer** was trained on Old, Middle, Early Modern, Classical Modern and pre-reform Modern Irish texts from St. Gall Glosses, Würzburg Glosses, CELT and the book subcorpus Historical Irish Corpus. The training data spans ca. 550 — 1926 and covers a wide variety of genres, such as bardic poetry, native Irish stories, translations and adaptations of continental epic and romance, annals, genealogies, grammatical and medical tracts, diaries, and religious writing. Due to code-switching in some texts, the model has some Latin in the vocabulary.