metadata

license: cc-by-sa-4.0
pipeline_tag: fill-mask

Model Card for Kashubian HerBERT Base

Kashubian HerBERT Base is a HerBERT Base model with a Kashubian tokenizer and fine-tuned on Kashubian Wikipedia.

Usage

Example code:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("ipipan/kashubian-herbert-base")
model = AutoModel.from_pretrained("ipipan/kashubian-herbert-base")

output = model(
    **tokenizer.batch_encode_plus(
        [
            (
                "Kaszëbi są zôpadnosłowiańską etniczną grëpã, aùtochtonicznym lëdztwã Pòmòrsczi.",
            )
        ],
    padding='longest',
    add_special_tokens=True,
    return_tensors='pt'
    )
)

License

CC BY-SA 4.0

Citation

If you use this model, please cite the following paper:

@misc{rybak2024transferring,
      title={Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching}, 
      author={Piotr Rybak},
      year={2024},
      eprint={2402.14408},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Authors

The model was created by Piotr Rybak from Linguistic Engineering Group at Institute of Computer Science, Polish Academy of Sciences.

This work was supported by the European Regional Development Fund as a part of 2014–2020 Smart Growth Operational Programme, CLARIN — Common Language Resources and Technology Infrastructure, project no. POIR.04.02.00-00C002/19.