manchuBERT

manchuBERT is a BERT-base model trained with romanized Manchu data from scratch.
ManNER & ManPOS are fine-tuned manchuBERT models.

Data

manchuBERT utilizes the data augmentation method from Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data.

Data	Number of Sentences(before augmentation)
Manwén Lˇaodàng–Taizong	2,220
Ilan gurun i bithe	41,904
Gin ping mei bithe	21,376
Yùzhì Q¯ıngwénjiàn	11,954
Yùzhì Zengdìng Q¯ıngwénjiàn	18,420
Manwén Lˇaodàng–Taizu	22,578
Manchu-Korean Dictionary	40,583

Citation

@misc {jean_seo_2024,
    author       = { {Jean Seo} },
    title        = { manchuBERT (Revision 64133be) },
    year         = 2024,
    url          = { https://huggingface.co/seemdog/manchuBERT },
    doi          = { 10.57967/hf/1599 },
    publisher    = { Hugging Face }
}