jvamvas commited on
Commit
cd7af0e
1 Parent(s): 830102c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -6,3 +6,28 @@ language:
6
  widget:
7
  - text: "Hinder s'Hans-Heiris Huus hani hundert Hase ghöre hueschte."
8
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  widget:
7
  - text: "Hinder s'Hans-Heiris Huus hani hundert Hase ghöre hueschte."
8
  ---
9
+
10
+ The [**google/canine-s**](https://huggingface.co/google/canine-s) model ([Clark et al., TACL 2022](https://aclanthology.org/2022.tacl-1.5/)) trained on Swiss German text data via continued pre-training.
11
+
12
+ ## Training Objective
13
+ We used the CANINE-S objective combined with the subword vocabulary of [SwissBERT](https://huggingface.co/ZurichNLP/swissbert).
14
+
15
+ ## Training Data
16
+ For continued pre-training, we used the following two datasets of written Swiss German:
17
+ 1. [SwissCrawl](https://icosys.ch/swisscrawl) ([Linder et al., LREC 2020](https://aclanthology.org/2020.lrec-1.329)), a collection of Swiss German web text (forum discussions, social media).
18
+ 2. A custom dataset of Swiss German tweets
19
+
20
+ In addition, we trained the model on an equal amount of Standard German data. We used news articles retrieved from [Swissdox@LiRI](https://t.uzh.ch/1hI).
21
+
22
+ ## License
23
+ Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
24
+
25
+ ## Citation
26
+ ```bibtex
27
+ @inproceedings{vamvas-etal-2024-modular,
28
+ title={Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect},
29
+ author={Jannis Vamvas and No{\"e}mi Aepli and Rico Sennrich},
30
+ booktitle={First Workshop on Modular and Open Multilingual NLP},
31
+ year={2024},
32
+ }
33
+ ```