swiss-german-canine / README.md
jvamvas's picture
Update README.md
cd7af0e verified
metadata
license: cc-by-nc-4.0
language:
  - gsw
  - multilingual
widget:
  - text: Hinder s'Hans-Heiris Huus hani hundert Hase ghöre hueschte.

The google/canine-s model (Clark et al., TACL 2022) trained on Swiss German text data via continued pre-training.

Training Objective

We used the CANINE-S objective combined with the subword vocabulary of SwissBERT.

Training Data

For continued pre-training, we used the following two datasets of written Swiss German:

  1. SwissCrawl (Linder et al., LREC 2020), a collection of Swiss German web text (forum discussions, social media).
  2. A custom dataset of Swiss German tweets

In addition, we trained the model on an equal amount of Standard German data. We used news articles retrieved from Swissdox@LiRI.

License

Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

Citation

@inproceedings{vamvas-etal-2024-modular,
      title={Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect},
      author={Jannis Vamvas and No{\"e}mi Aepli and Rico Sennrich},
      booktitle={First Workshop on Modular and Open Multilingual NLP},
      year={2024},
}