metadata
base_model: google/mt5-small
datasets:
- syubraj/roman2nepali-transliteration
language:
- ne
- en
library_name: transformers
license: apache-2.0
metrics:
- bleu
tags:
- generated_from_trainer
model-index:
- name: romaneng2nep_v2
results: []
romaneng2nep_v2
This model is a fine-tuned version of google/mt5-small on an syubraj/roman2nepali-transliteration. It achieves the following results on the evaluation set:
- Loss: 2.9652
- Gen Len: 5.1538
MOdel Usage
!pip install transformers
from transformers import AutoTokenizer, MT5ForConditionalGeneration
checkpoint = "syubraj/romaneng2nep_v3"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = MT5ForConditionalGeneration.from_pretrained(checkpoint)
# Set max sequence length
max_seq_len = 20
def translate(text):
# Tokenize the input text with a max length of 20
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=max_seq_len)
# Generate translation
translated = model.generate(**inputs)
# Decode the translated tokens back to text
translated_text = tokenizer.decode(translated[0], skip_special_tokens=True)
return translated_text
# Example usage
source_text = "muskuraudai" # Example Romanized Nepali text
translated_text = translate(source_text)
print(f"Translated Text: {translated_text}")
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 4
Training results
Step | Training Loss | Validation Loss | Gen Len |
---|---|---|---|
1000 | 15.0703 | 5.6154 | 2.3840 |
2000 | 6.0460 | 4.4449 | 4.6281 |
3000 | 5.2580 | 3.9632 | 4.7790 |
4000 | 4.8563 | 3.6188 | 5.0053 |
5000 | 4.5602 | 3.3491 | 5.3085 |
6000 | 4.3146 | 3.1572 | 5.2562 |
7000 | 4.1228 | 3.0084 | 5.2197 |
8000 | 3.9695 | 2.8727 | 5.2140 |
9000 | 3.8342 | 2.7651 | 5.1834 |
10000 | 3.7319 | 2.6661 | 5.1977 |
11000 | 3.6485 | 2.5864 | 5.1536 |
12000 | 3.5541 | 2.5080 | 5.1990 |
13000 | 3.4959 | 2.4464 | 5.1775 |
14000 | 3.4315 | 2.3931 | 5.1747 |
15000 | 3.3663 | 2.3401 | 5.1625 |
16000 | 3.3204 | 2.3034 | 5.1481 |
17000 | 3.2417 | 2.2593 | 5.1663 |
18000 | 3.2186 | 2.2283 | 5.1351 |
19000 | 3.1822 | 2.1946 | 5.1573 |
20000 | 3.1449 | 2.1690 | 5.1649 |
21000 | 3.1067 | 2.1402 | 5.1624 |
22000 | 3.0844 | 2.1258 | 5.1479 |
23000 | 3.0574 | 2.1066 | 5.1518 |
24000 | 3.0357 | 2.0887 | 5.1446 |
25000 | 3.0136 | 2.0746 | 5.1559 |
26000 | 2.9957 | 2.0609 | 5.1658 |
27000 | 2.9865 | 2.0510 | 5.1791 |
28000 | 2.9765 | 2.0456 | 5.1574 |
29000 | 2.9675 | 2.0386 | 5.1620 |
30000 | 2.9678 | 2.0344 | 5.1601 |
31000 | 2.9652 | 2.0320 | 5.1538 |
Framework versions
- Transformers 4.45.1
- Pytorch 2.4.0
- Datasets 3.0.1
- Tokenizers 0.20.0
Citation
If you find this model useful, please site the work.
@misc {yubraj_sigdel_2024,
author = { {Yubraj Sigdel} },
title = { romaneng2nep_v3 (Revision dca017e) },
year = 2024,
url = { https://huggingface.co/syubraj/romaneng2nep_v3 },
doi = { 10.57967/hf/3252 },
publisher = { Hugging Face }
}