README.md · zzzotop/zero-shot-cross-lingual-transfer-demo-masked at 18f726b3b95d80bca8c44d1473d503cf8dc830d4

Masked word prediction in 103 languages. To use, input a non-English sentence (you can use Google Translate to get one), replacing one of the words with "

distilbert-base-multilingual-cased finetuned on r/explainlikeimfive subset of ELI5 dataset for English masked language modelling. All knowledge of target language is acquired from pretraining only. Training set size 50,000 examples. Trained for 3 epochs. Mask probability 15%, batch size 8, learning rate 2e-5, weight decay 0.01. Final model perplexity 10.22.