metadata
license: cc-by-sa-4.0
pipeline_tag: fill-mask
Model Card for Kashubian HerBERT Base
Kashubian HerBERT Base is a HerBERT Base model with a Kashubian tokenizer and fine-tuned on Kashubian Wikipedia.
Usage
Example code:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("ipipan/kashubian-herbert-base")
model = AutoModel.from_pretrained("ipipan/kashubian-herbert-base")
output = model(
**tokenizer.batch_encode_plus(
[
(
"Kaszëbi są zôpadnosłowiańską etniczną grëpã, aùtochtonicznym lëdztwã Pòmòrsczi.",
)
],
padding='longest',
add_special_tokens=True,
return_tensors='pt'
)
)
License
CC BY-SA 4.0
Citation
If you use this model, please cite the following paper:
@misc{rybak2024transferring,
title={Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching},
author={Piotr Rybak},
year={2024},
eprint={2402.14408},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Authors
The model was created by Piotr Rybak from Linguistic Engineering Group at Institute of Computer Science, Polish Academy of Sciences.
This work was supported by the European Regional Development Fund as a part of 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.