castorini
/

afriteva_large

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ToluClassics commited on May 24, 2022

Commit

6c82d3a

•

1 Parent(s): e7b0b5d

update readme

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -39,6 +39,24 @@ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pi
 - 143 Million Tokens (1GB of text data)
 - Tokenizer Vocabulary Size: 70,000 tokens
 ## Training Procedure
 For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)

 - 143 Million Tokens (1GB of text data)
 - Tokenizer Vocabulary Size: 70,000 tokens
+## Intended uses & limitations
+```python
+>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+>>> tokenizer = AutoTokenizer.from_pretrained("castorini/afriteva_large")
+>>> model = AutoModelForSeq2SeqLM.from_pretrained("castorini/afriteva_large")
+>>> src_text = "Ó hùn ọ́ láti di ara wa bí?"
+>>> tgt_text =  "Would you like to be?"
+>>> model_inputs = tokenizer(src_text, return_tensors="pt")
+>>> with tokenizer.as_target_tokenizer():
+        labels = tokenizer(tgt_text, return_tensors="pt").input_ids
+>>> model(**model_inputs, labels=labels) # forward pass
+```
 ## Training Procedure
 For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)