Edit model card

xmod-base

X-MOD is a multilingual masked language model trained on filtered CommonCrawl data containing 81 languages. It was introduced in the paper Lifting the Curse of Multilinguality by Pre-training Modular Transformers (Pfeiffer et al., NAACL 2022) and first released in this repository.

Because it has been pre-trained with language-specific modular components (language adapters), X-MOD differs from previous multilingual models like XLM-R. For fine-tuning, the language adapters in each transformer layer are frozen.

Usage

Tokenizer

This model reuses the tokenizer of XLM-R.

Input Language

Because this model uses language adapters, you need to specify the language of your input so that the correct adapter can be activated:

from transformers import XmodModel

model = XmodModel.from_pretrained("facebook/xmod-base")
model.set_default_language("en_XX")

A directory of the language adapters in this model is found at the bottom of this model card.

Fine-tuning

In the experiments in the original paper, the embedding layer and the language adapters are frozen during fine-tuning. A method for doing this is provided in the code:

model.freeze_embeddings_and_language_adapters()
# Fine-tune the model ...

Cross-lingual Transfer

After fine-tuning, zero-shot cross-lingual transfer can be tested by activating the language adapter of the target language:

model.set_default_language("de_DE")
# Evaluate the model on German examples ...

Bias, Risks, and Limitations

Please refer to the model card of XLM-R, because X-MOD has a similar architecture and has been trained on similar training data.

Citation

BibTeX:

@inproceedings{pfeiffer-etal-2022-lifting,
    title = "Lifting the Curse of Multilinguality by Pre-training Modular Transformers",
    author = "Pfeiffer, Jonas  and
      Goyal, Naman  and
      Lin, Xi  and
      Li, Xian  and
      Cross, James  and
      Riedel, Sebastian  and
      Artetxe, Mikel",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.255",
    doi = "10.18653/v1/2022.naacl-main.255",
    pages = "3479--3495"
}

Languages

This model contains the following language adapters:

lang_id (Adapter index) Language code Language
0 en_XX English
1 id_ID Indonesian
2 vi_VN Vietnamese
3 ru_RU Russian
4 fa_IR Persian
5 sv_SE Swedish
6 ja_XX Japanese
7 fr_XX French
8 de_DE German
9 ro_RO Romanian
10 ko_KR Korean
11 hu_HU Hungarian
12 es_XX Spanish
13 fi_FI Finnish
14 uk_UA Ukrainian
15 da_DK Danish
16 pt_XX Portuguese
17 no_XX Norwegian
18 th_TH Thai
19 pl_PL Polish
20 bg_BG Bulgarian
21 nl_XX Dutch
22 zh_CN Chinese (simplified)
23 he_IL Hebrew
24 el_GR Greek
25 it_IT Italian
26 sk_SK Slovak
27 hr_HR Croatian
28 tr_TR Turkish
29 ar_AR Arabic
30 cs_CZ Czech
31 lt_LT Lithuanian
32 hi_IN Hindi
33 zh_TW Chinese (traditional)
34 ca_ES Catalan
35 ms_MY Malay
36 sl_SI Slovenian
37 lv_LV Latvian
38 ta_IN Tamil
39 bn_IN Bengali
40 et_EE Estonian
41 az_AZ Azerbaijani
42 sq_AL Albanian
43 sr_RS Serbian
44 kk_KZ Kazakh
45 ka_GE Georgian
46 tl_XX Tagalog
47 ur_PK Urdu
48 is_IS Icelandic
49 hy_AM Armenian
50 ml_IN Malayalam
51 mk_MK Macedonian
52 be_BY Belarusian
53 la_VA Latin
54 te_IN Telugu
55 eu_ES Basque
56 gl_ES Galician
57 mn_MN Mongolian
58 kn_IN Kannada
59 ne_NP Nepali
60 sw_KE Swahili
61 si_LK Sinhala
62 mr_IN Marathi
63 af_ZA Afrikaans
64 gu_IN Gujarati
65 cy_GB Welsh
66 eo_EO Esperanto
67 km_KH Central Khmer
68 ky_KG Kirghiz
69 uz_UZ Uzbek
70 ps_AF Pashto
71 pa_IN Punjabi
72 ga_IE Irish
73 ha_NG Hausa
74 am_ET Amharic
75 lo_LA Lao
76 ku_TR Kurdish
77 so_SO Somali
78 my_MM Burmese
79 or_IN Oriya
80 sa_IN Sanskrit
Downloads last month
7,934
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.