--- base_model: facebook/nllb-200-1.3B model-index: - name: translate-nllb-1.3b-salt results: [] datasets: - Sunbird/salt --- # Model details This machine translation model can convert single sentences from and to any combination of the following languages: | ISO 693-3 | Language name | | --- | --- | | eng | English | | ach | Acholi | | lgg | Lugbara | | lug | Luganda | | nyn | Runyankole | | teo | Ateso | It was trained on the [SALT](http://huggingface.co/datasets/Sunbird/salt) dataset and a variety of additional external data resources, including back-translated news articles, FLORES-200, MT560 and LAFAND-MT. The base model was [facebok/nllb-200-1.3B](https://huggingface.co/facebook/nllb-200-1.3B), with tokens adapted to add support for languages not originally included. # Usage example ```python tokenizer = transformers.NllbTokenizer.from_pretrained( 'Sunbird/translate-nllb-1.3b-salt') model = transformers.M2M100ForConditionalGeneration.from_pretrained( 'Sunbird/translate-nllb-1.3b-salt') text = 'Where is the hospital?' source_language = 'eng' target_language = 'lug' language_tokens = { 'eng': 256047, 'ach': 256111, 'lgg': 256008, 'lug': 256110, 'nyn': 256002, 'teo': 256006, } device = torch.device("cuda" if torch.cuda.is_available() else "cpu") inputs = tokenizer(text, return_tensors="pt").to(device) inputs['input_ids'][0][0] = language_tokens[source_language] translated_tokens = model.to(device).generate( **inputs, forced_bos_token_id=language_tokens[target_language], max_length=100, num_beams=5, ) result = tokenizer.batch_decode( translated_tokens, skip_special_tokens=True)[0] # Eddwaliro liri ludda wa? ``` # Evaluation metrics Results on salt-dev: | Source language | Target language | BLEU | | --- | --- | --- | | ach | eng | 28.371 | | lgg | eng | 30.45 | | lug | eng | 41.978 | | nyn | eng |32.296 | | teo | eng | 30.422 | | eng | ach | 20.972 | | eng | lgg | 22.362 | | eng | lug | 30.359 | | eng | nyn | 15.305 | | eng | teo | 21.391 |