# LuxemBERT

LuxemBERT is a BERT model for the Luxembourgish language.
It was trained using 6.1 million Luxembourgish sentences from various sources including the Luxembourgish Wikipedia, the Leipzig Corpora Collection and rtl.lu.
In addition, we partially translated 6.1 million sentences from the German Wikipedia from German to Luxembourgish as means of data augmentation. This gave us a dataset of 12.2 million sentences we used to train our LuxemBERT model.

If you use our model, please cite our paper:
[Will be added later]