---
license: apache-2.0
datasets:
- akoksal/muri-it
language:
- afr
- amh
- ara
- aze
- bel
- ben
- bul
- cat
- ceb
- ces
- cos
- cym
- dan
- deu
- ell
- eng
- epo
- est
- eus
- fas
- fin
- fra
- fry
- gla
- gle
- glg
- guj
- hat
- hau
- haw
- hbs
- heb
- hin
- hun
- hye
- ibo
- isl
- ita
- jav
- jpn
- kan
- kat
- kaz
- khm
- kir
- kor
- kur
- lao
- lat
- lav
- lit
- ltz
- mal
- mar
- mkd
- mlg
- mlt
- mon
- mri
- msa
- msa
- mya
- nep
- nld
- nor
- nya
- pan
- pol
- por
- pus
- ron
- rus
- sin
- slk
- slv
- smo
- sna
- snd
- som
- sot
- spa
- sqi
- sun
- swa
- swe
- tam
- tel
- tgk
- tha
- tur
- ukr
- urd
- uzb
- vie
- xho
- yid
- yor
- zho
- zul
base_model:
- google/mt5-xxl
pipeline_tag: text2text-generation
---

# MURI-101: Multilingual Instruction-Following Model for 101 languages (mT5-XXL)

MURI-101 is a multilingual instruction-following model, fine-tuned using a subset of the [**MURI-IT**](https://huggingface.co/datasets/akoksal/muri-it) dataset. It supports **101 languages** and outperforms most multilingual models in both **Natural Language Understanding (NLU)** and **Natural Language Generation (NLG)** tasks, especially in low-resource settings.

This model was trained on a dataset with multilingual reverse instructions, ensuring that outputs are culturally and linguistically appropriate for the target language, thus reducing translation artifacts.

[Paper](https://arxiv.org/abs/2409.12958)

### Model Architecture
- **Base Model**: mT5-XXL
- **Training Data**: Subset of MURI-IT
- **Training Setup**: Trained with [t5x](https://github.com/google-research/t5x) on 32 TPU v4-32. Batch size: 64, data packing enabled, learning rate: 3e-4 without a scheduler, 5 epochs.

## Results
We compare **MURI-101** against state-of-the-art models for multilingual instruction following. MURI-101 outperforms most multilingual models, except for Aya, across both NLU and NLG datasets.


|                   | Okapi | mT0 | mT0x | Aya-101 | MURI-101 |
|-------------------|----------------|--------------|---------------|------------------|---------------------------|
| arb      | 27.7           | 31.5         | 31.6          | 38.2             | 36.5                      |
| ben      | 26.8           | 31.6         | 30.2          | 35.8             | 33.0                      |
| cat      | 30.5           | 32.8         | 32.6          | 39.6             | 38.8                      |
| dan      | 31.8           | 33.0         | 32.0          | 39.7             | 38.4                      |
| deu      | 31.7           | 32.7         | 32.5          | 39.7             | 38.9                      |
...
| vie      | 27.5           | 30.9         | 31.1          | 34.8             | 36.8                      |
| zho      | 28.2           | 32.5         | 31.6          | 38.3             | 36.9                      |
| Avg.     | 28.8           | 31.5         | 30.8          | 37.3             | 36.0                      |

Additionally, our model complements Aya effectively, especially in low-resource settings.

| Language          | mT5  | Aya_1 | Aya_1 + MURI_1 |
|-------------------|------|-------|----------------|
| aze               | 20.4 | 37.0  | 39.5           |
| bel               | 22.4 | 32.1  | 33.7           |
| bul               | 20.7 | 34.4  | 38.1           |
| cym               | 18.4 | 33.0  | 35.5           |
| gla               | 19.3 | 28.7  | 35.2           |
| kaz               | 19.8 | 44.7  | 46.7           |
| khm               | 16.5 | 30.0  | 31.3           |
| lao               | 21.3 | 32.7  | 33.0           |
| slk               | 19.2 | 38.1  | 39.1           |
| slv               | 18.9 | 40.3  | 39.6           |
| Avg.              | 19.7 | 35.1  | **37.2**       |


## Use
To load and use the model, you can use the following:

### AutoModelForSeq2SeqLM

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

muri = AutoModelForSeq2SeqLM.from_pretrained("akoksal/muri-101")
tokenizer = AutoTokenizer.from_pretrained("akoksal/muri-101")

instruction = "Verilen cümlenin pozitif mi negatif mi olduğunu tahmin edin: Hayatta kesinlikle izlenmemesi gereken filmler kategorisindeki listemin en başına bu filmi koyarım."
# Turkish to English translation: Guess whether the given sentence is positive or negative: I would put this movie at the very top of the list of movies that absolutely should not be watched in life.
inputs = tokenizer(instruction, return_tensors="pt").to(device)
outputs = muri.generate(**inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# > negatif
# (negative)
```

### Pipeline

```python
from transformers import pipeline

muri = pipeline("text2text-generation",
                model="akoksal/muri-101")

muri("""این مقاله را خلاصه کنید
...تیم دانش‌آموزی کاوش باستانی یک بطری حاوی پیغام ۲۰۰ ساله در شمال فرانسه پیدا کردند""",
     max_new_tokens=150,
     do_sample=True,
     temperature=0.9,
     top_p=0.8)
# Summarize this article
# A student team of archeologists found a bottle containing a 200-year-old message in northern France ... [300 words]

# > در طول سالیان متمادی باستان شناسان فرانسوی تلاش زیادی برای پیدا کردن آثار و اشیای باستانی انجام داده اند اما این بار پیدا شدن بطری حاوی پیغامی به بیش از دو قرن پیش از آن تاریخ نشان می دهد.
# > Over the years, French archaeologists have made great efforts to find ancient works and objects, but this time, the discovery of a bottle containing a message shows that date more than two centuries ago.
```

Thanks to [Google's TRC program](https://sites.research.google/trc/about/) for supporting the training of this model.

Check out [the paper](https://arxiv.org/abs/2409.12958) for more detailed information on the experiments and results.

## Citation
```
@misc{koksal2024muri,
      title={MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions}, 
      author={Abdullatif Köksal and Marion Thaler and Ayyoob Imani and Ahmet Üstün and Anna Korhonen and Hinrich Schütze},
      year={2024},
      eprint={2409.12958},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.12958}, 
}
```